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FOREWORD 


Phase  and  frequency  tracking  problems  comprise  some  of  the  most 
nettlesome  nonlinear  filtering  problems  in  the  realm  of  signal  processing. 
These  problems  have  held  the  interest  of  control  and  communication 
theorists  at  least  since  1953/54  when  Lehan  and  Parks,  and  Youla  published 
their  work  on  maximum  likelihood  and  optimum  demodulation  on  an  interval. 
Over  the  years  Cox,  Viterbi,  Cahn,  Forney,  and  a  host  of  others  have 
advocated  dynamic  programming  for  the  solution  of  nonlinear  filtering 
problems.  This  research  follows  that  tradition. 

Dynamic  Programming  is  advocated  as  a  technique  for  finding  the 
maximum  a  posteriori  (HAP)  phase  or  f requency  modulated  sequence  to  pass 
through  a  data  set.  The  key  idea  is  to  pose  a  Markov  chain  model  on  the 
circle  (0,2ir)  for  phase  or  frequency,  and  then  generate  candidate  HAP 
sequences  that  are  consistent  with  the  data  and  the  a  priori  probability 
structure. 
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INTRODUCTION 


Phase  and  Frequency  tracking  are  the  classic  nonlinear  filtering  problems. 

They  arise  in  narrowband  analog  communication,  data  transmission,  and  spread 
spectrum  communication.  As  usually  stated,  the  problem  is  to  obtain  a  causal 
estimate  of  the  phase  or  frequency  baaed  on  noisy  phase  modulated  observations. 

The  best  known  solutions  are  phase-locked  loops  (PLL's). 

In  any  truly  nonlinear  filtering  approach  to  optimum  phase  tracking,  the 
basic  problem  is  to  propagate  an  a  posteriori  density,  conditioned  on  an  increasing 
measurement  record,  much  as  is  done  in  Kalman  Filtering.  Unfortunately,  there 
exist  no  finite-dimensional  schemes  for  propagating  the  exact  conditional  density 
or  for  propagating  a  finite-dimensional  sufficient  statistic.  One  must  approximate. 

Under  this  contract  we  have  developed  an  approach  to  phase  and  frequency 
sequence  estimation  (emphasis  on  the  word  sequence)  that  has  its  logical  antecedents 
in  the  filtering  philosophy  of  Youla  and  the  data  decoding  philosophy  of  l/iterbi. 

We  have  posed  a  maximum  a  posteriori  probability  (HAP)  sequence  estimation  problem 
that  leads  to  nonlinear  MAP  equations  not  unlike  the  continuous-time  MAP  interval 
equations.  We  have  derived  dynamic  programming  algorithms  to  efficiently  solve  for 
survivor  phase  and  frequency  sequences  that  approximate  the  desired  MAP  sequence. 

The  algorithms  also  provide  a  handy  mechanism  for  generating  fixed-lag  phase 
estimates,  although  this  is  not  the  problem  for  which  the  algorithm  is  derived. 

In  a  loosely  related  set  of  problems  we  have  studied  exact  likelihood  for 
autoregressive  moving  average  (ARMA)  processes.  We  have  derived  fast  algorithms 
for  constructing  likelihood,  and  established  interesting  connections  between  the 
work  of  Wold,  Kolmogorov,  Wiener ,  and  Kalman.  A  fast  Kalman  filter  has  been  realized 
in  16-bit  arithmetic  on  an  8086  microprocessor. 

In  the  sections  to  follow,  we  outline  the  problems  studied  and  summarize 
important  results. 


STATEMENT  OF  PROBLEMS  STUDIED 


We  summarize  here  the  main  problems  studied  under  this  contract. 

Phase  Modelling.  For  phase  and  frequency  tracking  the  first  problem  to  be  studied 
is  one  of  deriving  suitable  models  for  random  phase  and/or  random  frequency 
modulation. 

Dynamic  Programming  Algorithm  Development.  Once  a  phase  model  is  derived,  the 
next  problem  is  to  derive  a  likelihood  function  and  find  a  dynamic  programming 
algorithm  to  find  the  maximum  of  likelihood. 

Performance  Evaluation  for  Phase  and  Frequency  Sequence  Estimation.  The  next 
problem  is  to  simulate  the  dynamic  programming  algorithms  on  stochastic 
data  and  calculate  Monte-Carlo  performance  results. 

Simultaneous  Phase  Tracking  and  Data  Decoding.  When  complex  data  are  transmitted 
over  phase  jitter  channels,  there  arises  the  problem  of  simultaneously  tracking 
phase  and  decoding  data  symbols.  The  problem  is  to  derive  a  joint  likelihood 
function  for  phase  and  symbol  sequences,  maximize  it  with  a  dynamic  program 
algorithm,  and  compute  Monte-Carlo  performance  results. 

Maximum  Likelihood  Identification  of  ARMA  Systems.  The  problem  here  is  to  derive 
a  fast  algorithm  to  compute  likelihood  for  autoregressive  moving  average  (ARMA) 
sequences. 

Fixed  Point  Implementation  of  Kalman  Filters.  The  fast  Kalman  gain  algorithm  is 
a  fixed  point  algorithm  ideally  suited  for  computation  on  a  fixed  point  machine. 
But  the  problems  of  scaling  and  rounding  remain.  The  question  here  is  one  of 
deriving  scaling  rules  and  calculating  rounding  error  variances  in  time  varying 
Kalman  filters. 
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SUMMARY  OF  MOST  IMPORTANT  RESULTS 


The  most  important  results  of  this  study  are  summarized  below. 

Phase  Modelling.  We  have  derived  phase  models  for  random  phase,  random  FM, 
and  random  chirp  modulation.  Each  model  is  a  Markov  chain  defined  on  a 
cyclic  group.  Corresponding  correlation  and  spectral  results  have  been 
derived.  The  results  generalize  existing  results  on  the  spectral  theory  of 
chains,  and  leave  one  with  the  problem  of  selecting  states,  transition  prob¬ 
abilities,  and  run  lengths  to  achieve  model  matching  with  more  conventional 
models.  The  results  apply  for  coherent  and  noncoherent  FM.  The  figure  on  the 
following  page  gives  a  geometric  picture  of  the  kinds  of  phase  and  frequency 
models  we  have  used  in  most  of  our  work  on  algorithm  development,  phase  and 
frequency  tracking,  and  simultaneous  phase  tracking  and  data  decoding.  See 
references  1,2,  and  3  for  additional  details. 

Dynamic  Programming  Algorithm  Development.  The  basic  measurement  model  in  all 
of  our  work  has  been  the  fol lowing : 


zt  =  at  +  nt 

afc  j  symbol  drawn  from  a  finite  alphabet 

(|),  :  either  a  directly  modulated  phase  sequence  for  which 
we  know  the  transition  probability  density  P^t+l^t^ 
or  a  function  (JKO^)  of  a  frequency  sequence 

for  which  we  know  the  transition  probabilty  density 

pK+i/«t) 

n.  :  a  sequence  of  independent  and  identically  distributed 
normal  random  variables 

With  this  model  we  have  derived  expressions  for  likelihood  and  found  dynamic 
programming  algorithms  for  exactly  maximizing  or  approximately  maximizing 
likelihood.  Generally  the  algorithms  take  the  form: 


max  max  +  In  pC^/k.j)  +  g(+K> 

K,?K-1  V  >K-2 


The  function  g{.)  depends  on  the  details  of  the  problem.  See  references  1,2, 
and  3  for  details  about  selecting  g(.)  and  implementing  the  algorithm  on  a 
finite  trellis.  The  function  L  is  likelihood. 

It  is  our  opinion  that  a  variety  of  filtering  problems  in  signal  and  image 
processing  can  be  reformulated  as  sequence  or  interval  estimation 'problems 
for  which  likelihood  can  be  derived  and  for  which  algorithms  can  be  found 
for  approximating  the  maximum. 


Constant  Phase 


Random  Walk  Phase 


Constant  Frequency  it.  Random  Walk  Frequency 

z.  =  e-Tt  +  n. 


♦t  2  *<*V 

A  K  A 

Zt  ^1*  ^2*  *** 

niA*r  and  FRCQnrNTY  fionci  r, 
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Performance  Evaluation  for  Phase  and  Frequency  Sequence  Estimation.  Our 
results  are  nicely  summarized  on  the  graphs  of  the  following  pages.  The 
first  compares  estimation  error  variance  for  the  dynamic  programming  (or 
Viterbi)  solution  with  a  host  of  other  algorithms  ranging  from  the  phase 
lock  loop  to  the  point  mass  filter  and  the  Fourier  coefficient  filter. 

The  results  apply  to  the  problem  of  random  walk  phase  tracking. 

The  next  graph  shows  output  SNR  versus  input  CNR  for  sinusoidal  modulation 
of  a  carrier.  We  have  adapted  our  random  walk  FM  frequency  tracker  to  this 
problem  and  compared  its  performance  with  linear  prediction  trackers,  and 
the  trackers  of  Tufts  and  of  Toomey  and  Short. 

Our  performance  results  indicate  that  sequence  estimation  by  the  method  of 
dynamic  programming  to  maxinizie  likelihood  on  a  finite  trellis  provides  a 
way  of  improving  on  the  performance  of  more  classical  causal  estimators. 

This  improvement  can  be  significant  at  low  SNR. 

Simultaneous  Phase  Tracking  and  Data  Decoding.  The  performance  results  for 
this  problem  are  contained  in  Reference  2,  where  a  variety  of  binary,  phase 
shift  keying,  and  quadrature  shift  keying  communication  problems  are  considered. 
The  third  figure  in  the  sequence  of  three  figures  that  appears  on  the  next 
three  pages  shows  just  one  of  the  many  examples  contained  in  Reference  2. 

The  graph  shows  how  two  simultaneous  phase  trackers  and  data  decoders,  namely 
the  Viterbi  tracker  and  the  jitter  equalizer,  achieve  performance  very  close 
to  that  achievable  under  coherent  phase  conditions.  The  results  apply  to 
the  decoding  of  8-ary  phase  shift  keyed  symbols. 


6 


5 


*  i - J  l  I  *  I  « 

•4-3-2-10  1  2  3 

10  ;»•<#•*  r*)^* 


JL. 

4 


X. 

3 


6 


€ 
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Maximum  Likelihood  Identification  of  ARMA  Systems.  We  have  followed  the  lead 
of  Akaike  and  Anderson  and  Moore  to  write  down  the  innovations  representation 
that  reproduces  the  second  order  statistics  of  a  stationary  ARMA  sequence.  We 
have  then  associated  the  gain  of  a  Kalman  filter  with  the  triangular  square 
root  of  a  Toeplitz  matrix  to  rederive  Morf's  fast  Kalman  filter  algorithm. 

The  result  is  a  fast  algorithm  for  implementing  likelihood.  The  results  are 
summarized  in  References  4  and  5. 

Fixed  Point  Implementation  of  Kalman  Filters.  Beginning  with  the  innovations 
representation  of  a  stationary  ARMA  sequence,  we  have  derived  scaling  rules  to 
prevent  overflow  in  time  varying  Kalman  filters  and  derived  formulas  for 
rounding  error  variance.  The  scaling  rule  is 


a44>%.  jl£± 


s(k):  inverse  of  time  varying  scale  constant 

q(k,k)  s  (k,k)th  element  of  the  state  variance  matrix 

(L  2m_1  :  dynamic  range  of  the  fixed  point  representation 

$  :  design  parameter  that  allows  designer  to  control 

the  probability  of  overflow 

This  formula  generalizes  the  results  of  Mullis  and  Roberts  to  time  varying 
cases. 

The  figure  on  the  following  page  illustrates  our  experimental  setup  for 
implementing  the  Kalman  filter  on  an  8086  microprocessor.  The  figure  on 
the  next  page  shows  a  typical  simulation  showing  performance  on  the  fixed 
point  machine  with  that  achievable  on  a  floating  point  machine.  The  results 
apply  to  one-step  prediction.  The  circles  highlight  places  where  the 
fixed  point  and  floating  point  results  differ  by  more  than  1  bit  in  6. 
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I.  Introduction 

Phase  tracking  is  the  classic  nonlinear  filtering  problem,  it 
arises  in  narrowband  analog  communication,  data  transmission, 
and  spread  spectrum  communication.  As  usually  stated,  the 
problem  is  to  obtain  a  causal  estimate  of  the  phase  based  on 
noisy  phase-modulated  observations.  The  best  known  solutions 
are  phase-locked  loops  (PLL’s). 

In  any  truly  nonlinear  filtering  approach  to  optimum  phase 
tracking,  the  basic  problem  is  to  propagate  the  a  posteriori 
density  of  the  phase,  conditioned  on  an  increasing  measurement 
record,  much  as  is  done  in  Kalman  filtering.  Unfortunately, 
there  exist  no  finite-dimensional  schemes  for  propagating  the 
exact  conditional  density  or  for  propagating  a  finite-dimensional 
sufficient  statistic.  One  must  approximate.  The  interested  reader 
may  consult  [18]  for  a  review  of  the  best  known  techniques  or, 
better  yet,  go  directly  to  the  appropriate  source  [1J-|  13]. 

In  this  correspondence  we  propose  an  approach  to  phase 
sequence  estimation  (emphasis  on  the  word  sequence)  that  has  its 
logical  antecedents  in  the  filtering  philosophy  of  Youla  [2]  and 
the  data  decoding  philosophy  of  Viterbi  [14].  We  pose  a  maxi¬ 
mum  a  posteriori  probability  (MAP)  sequence  estimation  prob¬ 
lem  that  leads  to  nonlinear  MAP  equations  not  unlike  the 
continuous-time  MAP  interval  equations.  Fortunately  there  ex¬ 
ists  a  dynamic  programming  algorithm  to  efficiently  solve  for 
survivor  phase  sequences  that  approximate  the  desired  MAP 
sequence.  The  algorithm  also  provides  a  handy  mechanism  for 
generating  fixed-lag  phase  estimates,  although  this  is  not  the 
problem  for  which  the  algorithm  is  derived.  As  is  common  in 
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most  of  ihe  current  communications  literalure  we  call  our  dy¬ 
namic  programming  algorithm  a  Vilerbi  algorithm. 

Cahn  1 1 5)  has  suggested  that  phase  may  be  tracked  with  delay 
in  order  to  extend  the  so-called  threshold.  He  proposes  a  Viterbi- 
like  algorithm  for  tracking  carrier  phase  sequences  whose  reali¬ 
zations  satisfy  dynamics  constraints.  There  is  certainly  a  philo¬ 
sophical  link  between  Cahn’s  work  and  ours.  In  fact  it  was 
Cahn's  paper  that  first  aroused  our  interest  in  phase  sequence 
estimation.  However  the  approaches  are  really  quite  different 
Ungerboeck  [16]  has  proposed  an  algorithm  for  phase  tracking 
that  makes  use  of  a  delta-modulation  approximation  to  the 
phase  sequence  and  an  approximate  version  of  the  Viterbi  algo¬ 
rithm.  Tufts  and  Francis  [17]  have  also  recently  proposed  an 
algorithm  for  obtaining  smoothed  phase  estimates. 

II.  The  Basic  Problem 

Let  (Zk)  denote  the  complex  observation  sequence 

Z*-e**  +  tf*,  *-l,2,--, 

where 

Nk-  Uk+jVk.  Af*i-LAf,  for  k  W, 

Uk :  SS.(0,<7„J),  Vt  °H(0,ol), 

UkX±y,  for  all  k,/,  (1) 

(<t>k)  is  a  discrete-time  phase  sequence  to  be  discussed  shortly 
and  [Nt)  is  an  additive  noise  sequence  of  independent  identi¬ 
cally  distributed  (i.i.d.)  normal  random  variables.  Our  notation  is 
that  Nk±±N,  means  Nt  and  N,  are  independent,  and 
Uk :  N( 0,  o„l)  indicates  that  Uk  is  a  normal  random  variable  with 
mean  0  and  variance  of.  The  sequence  {Zk)  may  be  thought  of 
as  a  complex  representation  for  the  sample  values  appearing  at 
the  output  of  a  quadrature  demodulator.  The  problem  is  to 
estimate  a  realization  of  the  entire  sequence  {♦*)*,  say  {+*}*, 
from  the  measurement  record  {**},.  It  turns  out  that  this 
formulation  also  provides  a  convenient  way  to  generate  a  se¬ 
quence  of  fixed-lag  estimates.  However,  we  emphasize  that  the 
basic  problem  under  investigation  is  one  of  estimating  an  entire 
sequence,  not  one  of  generating  a  sequence  of  fixed-lag,  fixed- 
interval,  or  fixed-point  smoothing  solutions.  We  make  the  obvi¬ 
ous  but  important  observation  that  the  signal  model  (I)  is 
invariant  under  a  modulo-2*  transformation  on  the  phase. 


HI.  Random  Walk  on  the  Circle  as  a  Model  for 
Phase  Noise 

The  first,  seemingly  natural,  choice  for  a  random  phase  model 
is  the  Wiener  process  fV(t)  with  incremental  variance  «£.  This  is 
the  most  commonly  used  model  for  random  phase  acquisition. 
Most  of  the  results  in  this  correspondence  may  be  obtained  in  a 
formal  way  using  this  phase  model,  but  certain  technical  difficul¬ 
ties  arise.  First,  there  is  no  statkmaiy  distribution  and.  second, 
there  is  no  rigorous  way  of  defining  a  unique  conditional  proba¬ 
bility  for  transitions  from  a  modulo-2*  value  of  W(r)  to  another 
modulo-2*  value  at  a  later  time  t  -f  r.  The  latter  difficulty  is 
particularly  troublesome  as  one  of  the  crucial  parts  of  our 
modulo-2*  phase  sequence  estimator  is  a  transition  probability 
matrix  that  characterizes  phase  transitions  between  modulo-2* 
values.  By  modeling  phase  as  a  random  walk  on  the  circle  we 
avoid  these  technical  difficulties.  Other  authors  (see  for  example 
[7|)  have  also  noted  that  the  circle  is  the  appropriate  domain  on 
which  to  study  modulo-2*  type  sequences. 

Let  ♦(/)  be  a  random  walk  on  the  circle,  taking  values  in 
[  -  *,*).  Denote  by  the  function  p(4i/4<)  the  conditional  density 
of  a  transition  from  the  value  ♦(.*)— q,  at  time  s  to  the  value 
*U)Bgk  at  time  t  >s.  This  conditional  density  satisfies  the 
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partial  differential  equation  [20j 

j'Pi*,/*,)-  <2> 

where  is  the  in/initessimal  variance.  This  equation  holds  in 
the  strip  -*<4><*,  t>s.  for  any  fixed  -*<4,<*.  The 
boundary  conditions  are 

lim  p(4,/  4,)  “  8(4i  -  4,). 

t~*S 

/>(4i-  -*/4,)“/>(4,”  V4,). 

(3) 

where  S  is  the  Dirac  delta  function.  We  are  using  the  convention 
that  ♦(/)  denotes  a  random  variable  and  4,  a  realization.  When 
the  context  is  clear  and  there  is  no  danger  of  confusion  we  will 
sometimes  make  no  distinction  in  notation  between  a  random 
variable  and  its  realizations.  The  same  cautionary  note  holds  for 
the  discrete-time  random  variable  Qk  and  the  realization  4A. 

The  solution  for  /<)>,)  is 

/>(4i,/4,)-  ~  ' 

ylvai(t-s) 

It  is  easily  seen  that  the  process  <h(f)  is  conditionally  approxi¬ 
mately  9t(^.  -.»)),  given  4(j)“4f  for  small  i—s.  An  eigen¬ 

function  expansion  of  the  following  form  is  also  useful: 

P(*»/+,)“jZ  2  «p(-»2«*>('-j)/2)«p(./«(4i-4,))- 

»-  “00 

(5) 

This  is  simply  Poisson’s  summation  formula  for  (4).  From  this 
expression  it  is  clear  that  ♦(/)  becomes  uniformly  distributed  as 
t  -  r— *oo.  Equation  (S)  has  also  been  noted  in  [7]. 

Consider  the  discrete-time  sequence  {♦*}  obtained  by  sam¬ 
pling  ♦(/)  at  the  periodic  sampling  instants  l<*kT,  *-0,  l,  - . 
Call  a  realization  of  4A.  The  transition  density  from  4A_,  to 
4a  is  found  from  (4)  with  o*"“o£T  to  be 


P<4*/4*-,)- 


(6) 


By  the  Markov  property  of  •(!)  it  follows  that  {♦*}  is  a 
Markov  sequence  for  which  the  joint  distribution  of  {♦*}*  may 
be  written 

* 

p({**}?)-  n /(♦»/♦»-,).  (7) 

where 


P<4 1/*>):  </[-*.*). 


and  the  notation  p{  4,/4,):  £/(-*,  *)  indicates  that  ♦,  is  uni- 
fonnily  distributed  on  (-*,  *).  Other  choices  are  also  admissa- 
Me:  for  example,  p(4i/4«)a>4(4|-4o)  4d  known  corre¬ 
sponds  to  a  given  initial  phase. 

One  may  obtain  the  same  discrete-time  model  for  {♦*}  by 
considering  to  be  a  modulo-2*  version  of  the  following 
discrete-time  random  walk: 

+  Wkl±W,totk+l. 

Wk :  ^(O.ojr).  ei.-Oo'T.  (8) 


The  modulo-2  *  version  of  H, .  call  il  HA.  may  be  written 

,  +  mv  (9) 

Given  0A  . ,.  the  random  variable  Aa  is  'X  ( 9k  |.oi  ).  As 
SA  *s  a  modulo-2*  version  of  6A  it  follows  that  the  conditional 
density  of  @A ,  given  SA  ,  *  0t  ,.  is  the  folded  normal  density  of 
(6).  For  this  reason  we  will  often  call  the  discrete-time  process 
on  the  circle  (<hA)  a  modulo-2*  version  of  the  discrete-lime 
random  walk  (HA }. 

In  Section  VII  we  discretize  the  phase  space  |  -*.  *)  to  phase 
values  m«0, •  ■  .  M  1  with  M  odd  It  is  then  necessary  to 
characterize  the  transition  probability  from  (,  to  £m  for  all  M 2 
pairs  of  ((;.{„).  We  choose  for  our  definition  of  this  transition 
probability 

P"(4*-£,^4*-i  =  £i>“^/><4»-£.,/4*  i “ C/ )  (10) 

with  b,  selected  so  that 

m- t 

2  P(*A-lmU*  !“€/)- I.  (“0,1,-  •  •  ,  M-  I.  (II) 

The  sum  on  n  in  (6)  must,  of  course,  be  truncated.  This  trunca¬ 
tion  may  be  selected  to  give  the  desired  accuracy  before  the 
algorithm  of  Section  VU  is  run.  There  is  no  series  truncation 
whatsoever  in  the  algorithm  itself. 

There  is  an  important  symmetry  property  of  (10).  If  the  (,  are 
equally  spaced  points  on  [-*,*),  for  example  {,— llit/M- 
(M-  1  )v/M,  the  function  p  depends  only  on  |{„— {/|.  Thus  if 
the  values  of  (10)  are  organized  into  an  MxM  matrix  of 
transition  probabilities,  the  matrix  is  Toeplitz.  We  may  compute 
the  ^-dimensional  vector  Q“(q 0,  q,,-  ■  ■ ,  q^-,)  with  qmmp(+k 
-to /♦*- 1  -£„)  “d  ob**in  any  value  of  p(4*-£m/4*-i  -{,)  as 

with  n—|m-/|.  In  this  way  only  an  M-vector  of  transition 
probabilities  need  be  stored  for  cyclic  reading. 

IV.  The  MAP  Sequence  Estimation  Problem  for 
Modulo-2*  Phase 

Consider  the  following  maximization  with  respect  to  the  mod¬ 
ulo-2*  phase  sequence  {4*}*: 

ma xp((zk)?.«.k}f),  (12) 

(♦.If 

where  />(•,  •)  is  tile  joint  density  function  for  the  K  measure¬ 
ments  (z*}f  and  the  K  modulo-2*  phase  values  {4*}*.  Maximi¬ 
zation  of  this  joint  density  function  is  equivalent  to  maximi¬ 
zation  of  the  a  posteriori  density  /’({4A)f/{zA)f.  The  joint  den¬ 
sity  in  (12)  may  be  written 

p(MfA+k)f)-p(Mf/(**)f)p({+»)f) 

*f((4i)f)J\(r^,(/),  (13) 

where  the  last  line  follows  since  the  Nk  in  (I)  are  i.i.d.  normal 
random  variables  and  because,  conditionally, 

/»(**/♦*):  «.„*).  (14) 

Here  <DL,t(eJ*k,  o„2)  indicates  that  the  conditional  density  of  the 
complex  random  variable  Zk  (conditioned  on  4*)  is  normal  with 
mean  et*k  and  variance  of  : 

^.(•/^.«l.,)-(2^,)',eRp{--i;|g*-«^|*}.  (15) 

Dropping  phase  independent  terms  we  may  write  the  MAP 
sequence  estimation  problem  as 


max  l\ 

(♦.If 
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where 

.  A  A 

1a“  j  Re  X  A i  4  2  !*»/»( <#>*/<>*  ,).  (»6) 

0,  *  -  I  A  -  I 

A*  is  the  phase-corrected  vector 

A  K 

A*-  2  <•*,'<*•  ♦*>-  2  v**.  (17) 

A- I  k- I 

and  c*  and  ^  are,  respectively,  envelope  and  phase  variables: 
zk~ckeJ*',  — «•, #),  c*e(0,oo).  It  is  clear  from  this  form 
that  the  MAP  phase  sequence  will  be  one  that  stays  reasonably 
close  to  the  noisy  phase  variables  ipk  (to  make  cos(i pk-$k)  large) 
while  also  maintaining  a  trajectory  that  is  a  priori  reasonably 
likely.  Thus  the  MAP  sequence  strikes  a  balance  between  what 
the  noisy  data  <(ik  says  the  phase  is  doing  and  what  the  transition 
probabilities  />(♦*/♦*_ i)  say  the  phase  can  do.  When  the  en¬ 
velope  ck  is  large  there  is  more  of  a  tendency  to  believe  the 
measured  ifrk-  This  curious  effect  may  be  explained  by  noting 
that  the  phase  statistic  is  a  modulo-2  w  unbiased  estimate  of 
♦*  with  a  variance  that  decreases  approximately  inversely  with 
increasing  ck  (18]. 


V.  Characteristics  of  the  MAP  Sequence 

Given  the  envelope  and  phase  variables  {c*)f  and  the 

MAP  phase  sequence  may  be  obtained  by  equating  the 

derivatives  of  T,  to  zero: 

3^top(+*/£*- 1 )  +  lnp(£*+ ■/♦*  ) 

+  k-l,2,  -  ,K.  (18) 


The  boundary  conditions  are  r0— 0  and 

2)  0^I,1p(^a+i/^a)",O- 


(1») 


Condition  1)  simply  reflects  the  fact  that p(*,/4 0)  •»  uniform  on 
l_*.  *)-  Condition  2)  is  a  mathematical  convenience  that  allows 
ua  to  put  all  the  equations  of  (18)  in  the  same  form.  Of  course  4o 
and  ,  are  not  computed  from  the  data  (ck)f  and 
Equations  (18)  are  nonlinear  equations  with  two-point 
boundary  conditions.  They  are  analogous  to  the  continuous-time 
MAP  equations  obtained  for  phase  tracking  on  an  interval. 
While  we  cannot  solve  the  equations  of  (18)  explicitly  we  can 
make  some  very  interesting  observations  regarding  the  properties 
of  the  MAP  phase  sequence. 

It  is  easily  verified  from  the  conditional  density  of  (6)  that 

(20) 


Therefore  when  the  K  equations  of  (18)  are  summed  and  the 
boundary  conditions  applied,  all  terms  involving  lnp(4*.»i/^*) 
cancel.  The  sum  on  the  terms  involving  lm(A,-A*_,)  tele¬ 
scopes,  and  we  are  left  with  the  result 

lmA,-0.  (21) 

Here  A*  is  A,  with  p,  set  to  the  MAP  estimate  for  *« 
1,2,-  •  • ,  K.  This  allows  us  to  make  the  following  observation: 
while  maximizing  the  objective  function  P*.  the  MAP  sequence 
(♦*)f  yields  a  maximum  value  for  r*  of 

<V--jReA,+  £  lnp(**+,/4»)  (22) 

a- i 


the  condition  1m  A* “0  (e.g„  the  sequence  of  maximum  likeli¬ 
hood  estimates  4>*  — 1£*)>  but  these  sequences  do  not  also  maxi¬ 
mize  r*. 


VI.  The  MAP  Sequence  for  Fixed  Phase  Acquisition 

Suppose  the  underlying  phase  sequence  ($*}*  is  known  to  be 
a  constant  sequence  with  the  value  of  the  constant  uniformly 
distributed  on  [— w,  *].  In  this  case  the  MAP  sequence  estimate 
is  identical  with  the  maximum  likelihood  (ML)  estimate  of  an 
unknown  phase  parameter  +  in  a  complex  normal  model.  For 
this  reason,  and  for  the  insight  it  gives  into  phase  estimation,  we 
include  in  the  following  paragraphs  a  short  discussion  of  con¬ 
stant  phase  and  envelope  models.  The  inclusion  of  an  unknown 
envelope  c  generalizes  the  discussion  without  changing  the  na¬ 
ture  of  the  phase  estimate.  This  follows  from  the  fact  that  the 
phase  estimate  is  uncoupled  from  the  envelope  estimate.  The 
converse  is  not  true. 

Consider  the  joint  density  function  for  the  data  (Z*}f,  para¬ 
meterized  by  the  envelope  c  and  the  phase  4>: 


-«/(♦, cj^tztjf^xpj^Re ce 2  ** |  (24) 

where 

*({**}fr)“exp|  -  l**l2  J 

It  follows  from  the  factorization  theorem  (21,  p.  IIS]  that  the 
complex  statistic  K~  'Zf.i  Zk  is  sufficient  for  the  parameter  pair 
(c,  +).  The  ML  estimate  for  the  composite  parameter  a  —  ceJ* 


is 


a-K 


K 

2 

A-  I 


(26) 


This  ML  estimator  is  consistent,  unbiased,  efficient,  and  mini¬ 
mum^  variance  unbiased.  The  corresponding  ML  estimates  for  c 
and  ^  are 


c-AT-' 


K 

2  ** 


A-l 


with  the  property  that  Im  A„-0.  This  property  is  illustrated  in 
Fig.  I.  We  note  that  there  are  many  other  sequences  that  satisfy 


♦-«g  2 


A-l 


*A 


(27) 
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Fig.  2.  MAP  estimate  for  fixed  phase  acquisition. 


Let  C  and  $  be  the  estimators  corresponding  to  the  estimates 
c  and  ^>.  The  estimator  C  is  consistent,  unbiased,  efficient,  and 
minimum  variance  unbiased.  The  phase  estimator  <t>  is  not 
efficient  and  no  efficient  estimator  exists.  It  is  consistent  but 
biased.  However  it  is  modulo-2  w  unbiased,  which  is  the  property 
we  want.  The  phase  estimate  <#>,  also  obtained  in  (8]  and  [17]  in 
different  ways,  is  illustrated  in  Fig.  2. 

Define  the  modulo-2  <r  estimator  error  A$— (^-$)mod  2v. 
We  may  write 

K~l  2  Z*e (28) 
*- 1 

The  statistic  K~,'2k^lZte~J*  is  <DL(c,o?/K).  The  Jacobian  of 
the  transformation  between  (C.  A<I>)  and  is  <?. 

Therefore  the  joint  density  of  C  and  A^  is 

■  “p"  5^  • 

(29) 

This  r  suit  is  equivalent  to  [22,  eq.  (9.46),  p.  413]  with  ap¬ 
propriate  change  of  notation. Jn  (29)  it  is  assumed  that  -*  < 

<v.  On  this  interval  g(c,  A$)  is  symmetrical  about  zero  and 
therefore  unbiased.  We  emphasize  that  $  is  only  modulo-2  n 
unbiased. 

VII.  The  Viter bi  Algorithm 

The  MAP  sequence  estimation  problem  is  stated  in  (18).  Note 
that  r*  satisfies  the  recursion 

r*-r1.,+  c*  cos( **-*»)  + In p(*n/*»  i). 

r,  -  1  r,  cos(«i., -♦,)  +  lnp(<^|/^0).  (30) 

The  so-called  path  metric  is 

cos(fa-fa)  +  In p(fa/fa.,).  (31) 

The  maximization  problem  for  obtaining  the  MAP  phase 
sequence  may  now  be  written 

max  max  ,r*_,  + In />(♦*/♦,_,)+  -!jc*  cos(fa-^K)  . 

(32) 

This  form  leads  to  the  following  observation:  the  maximizing 
trajectory  (call  it  ( <j>t  }f ),  passing  through  (  on  its  way  to  , 
must  arrive  at  ir  ,  along  a  route  (i, ),  *  that  maximizes  IV  . 


i,  ’  6*/  7 
£,■  «W7 
£,*  2t/7 

f,  *  0 

{, -2v/7 
£,-4»/7 
(«  *-6  v/7 

12  3  4  5 

k 

Fig.  3.  Phase  trellis  illustrating  evolution  of  surviving  phase  tracks 

For  if  it  did  not  we  could  retain  <(>*- 1  and  <f>*  and  replace 
2  with  a  different  sequence  to  get  a  larger  value  for  T*.  It 
is  this  observation  which  forms  the  basis  of  forward  dynamic 
problem. 

The  trellis  of  Fig.  3  illustrates  how  the  maximization  of  (32) 
proceeds.  Tabulated  values  of  p(4>*/<#>*- 1)  are  stored  in  a  square 
array  (or  vector  which  is  read  cyclically)  whose  dimensions 
depend  upon  how  finely  the  interval  [-w,  w)  is  discretized.  Let 
Sx2,  with  S “{£/}“,  be  the  finite-dimensional  grid  for  which 
/>(♦*/$*- 1)  defined.  That  is,  <pk  is  assumed  to  take  on  only 
the  values  ♦*-{/,  /—  1,2, ■  •  • ,  M,  for  each  k.  Let  rt(£,, {„)  be 
the  value  of  the  metric  rt  corresponding  to  a  phase  trajectory 
{<y}*  which  terminates  at  phase-state  £,  at  stage  k,  after  passing 
through  stage  £m  at  stage  k  -  I. 

The  algorithm  begins  with  a  computation  of  T, (£/,£;),  /— 
1,2,-  ■  • ,  At  based  on  measured  values  of  c(  and  ^(.  If  all  phase 
values  are  equally  likely  a  priori,  then  In  is  constant  on 

all  values  Otherwise  there  is  some  a  priori  weighting  in  favor 
of  some  of  the  A  new  measurement  pair  (c2,  fa)  is  obtained 
at  fc-2  and  r2({,,  (m)  is  computed  for  m— 1,2,-  ■  ■  ,M  using  a 
table  look-up  (for  example  in  a  read-only  memory)  for  the 
The  maximum  value  of  r2({,,|„)  is  de¬ 
termined  (over  all  originating  series  (m)  and  the  corresponding 
sequence  ({,,  £,)  is  saved  as  a  survivor  sequence  terminating  at 
at  stage  2;  denotes  the  originating  state.  The  survivor 
sequence  is  labeled  with  its  corresponding  length  r2({, ,  ii ).  This 
calculation  is  repeated  for  each  possible  value  of  phase  until  all 
pairs  ((,,  i,)  and  corresponding  lengths  r2({/t  {,),  /—  1,2,-  •  • ,  M , 
have  been  computed  and  stored  (for  example  in  a  random  access 
memory).  There  is  a  unique  survivor  sequence  corresponding  to 
each  slate  £,.  /■=  1,2, •  •  • .  M.  Caution:  In  the  pair  (£,.{,)  the 
originating  stale  £,  depends  on  f,;  i.e.,  £/“{,(£,).  The  measure¬ 
ments  r2  and  fa  may  now  be  discarded  along  with  all  extinct 
sequence.'  A  new  measurement  pair  ( r3,  i//3)  is  now  obtained, 
and  the  procedure  continues. 

Let  (^| (k),  fa(k),-  ■  • ,  <P*(  A ))  be  the  MAP  sequence  based  on 
k  measurements;  this  sequence  has  the  maximum  value  of  I\. 
The  parenthetical  notation  ( k )  denotes  dependence  on  measure¬ 
ment  interval.  In  general  the  MAP  sequence  estimate  (^,(A  + 
!)•  ••  •  ,^*  +  i(*+  I)  based  on  measurements  up  to  stage  k+  1 
may  differ  from  the  previous  sequence  estimate  at  every  stage 
from  I  to  k.  However,  as  a  practical  matter,  one  can  choose  a 
sufficiently  large  depth  parameter  k0  so  that  the  sequence  of 
fixed-lag  estimates 

fa  »„<*>•  *-*o+l.*0+2.--  (33) 

gives  an  approximate  MAP  sequence  estimate.  Here  4,  _*,(*)  is 
simnlv  the  phase  value  kn  stases  back,  in  the  MAP  seauence 
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estimate  based  on  A;  measurements.  In  this  way  one  obtains  a 
phase  track  with  delay  k0. 

Following  Forney  (23)  we  may  summarize  the  storage  and 
computational  requirements  for  the  phase  tracking  algorithm  as 
follows. 

Storage 

A  (time  index), 

(£/.{(■  .'  ).  /“  1.2, •  •  ■ ,  A/  (survivor  phase  sequence 

terminating  in  £,  at  stage  k  ), 

!  *(€/.  ).  /”  1.2,- ■  ■ ,  M  (survivor  metric). 

p(  ii/im  )■  f.  I,  -  ■  •  ,  M  (transition  probability  matrix) 

Initialization 
k~  1, 

li-l,.  /«l."  ,M. 

•  i(£/.i/)-ln/’(<fri-4,/<j>o)+  -^2c,  cos( iAi-£/). 


Recursion 

r*+ r^D+inp^ ,  -6M  -U 

+  r^i2c*  +  ,  cos(^*+,-6).  /-  1,2,-  •  •  ,M, 

2  oi 

min  r*+ i(4i»Cn)*  f"  f»2,'  •  •  ,W, 

Measurement  /  Computation 

ck  (envelope), 

•At  (phase), 

c*cos(£,-ik)  +  lnp(*»-t,/4*-i“t«)  (pathmetric). 

In  Fig.  3  typical  trajectories  lot  this  algorithm  are  illustrated. 
The  heavy  lines  denote  survivors  and  the  light  lines  denote  path 
metric  calculations  that  are  made  and  then  discarded  in  favor  of 
survivors.  At  the  third  stage  all  calculations  r3((2,  („)  are  il¬ 
lustrated  with  light  lines;  the  heavy  line  from  (3  to  f  2  illustrates 
that  this  path  gives  maximum  r2({2,  („)  and  is  therefore  labeled 
a  survivor.  Of  course  £2—(3.  The  letters  x  on  the  trellis  illustrate 
sequences  that  have  survived  for  a  while  before  being  ex¬ 
terminated  by  the  weight  of  evidence.  The  very  heavy  line  at 
each  measurement  stage  k  denotes  the  current  MAP  sequence. 
The  labeling  numbers  on  the  heavy  paths  denote  the  current 
MAP  sequence.  The  sequence  of  end  points  labeled  with  the 
numbers  is  a  sequence  of  phase  estimates.  Note  this  sequence  of 
phase  estimates  differs  from  the  MAP  phase  sequence.  The  latter, 
being  a  smoothing  solution,  is  in  fact  generally  smoother  than 
the  former. 


In  Fig.  4  Monte-Carlo  simulation  results  for  the  Viterbi  tracker 
are  presented  for  the  parametrizations  commonly  considered  in 
the  literature  The  results  are  compared  with  the  point  mass  filter 
(PMF)  [8J,  tne  Fourier  coefficient  filter  (FCF)  [7],  the  linear 
quadrature  filter  (LQF)  [9],  and  the  Gaussian  sum  filter  (GSF) 
[13],  Also  shown  are  our  simulation  results  for  the  PLL.  These 
results  are  presented  to  legitimize  the  simulation.  The  Viterbi 
tracker  makes  up  more  than  1.0  of  the  2.0  dB  performance  gap 
between  the  PLL  and  an  idealized  linear  tracker.  In  terms  of  rms 
phase  error  (in  radians),  the  comparison  between  the  PLL  and 
the  Viterbi  tracker  goes  as  follows.  The  PLL  has  an  rms  phase 
error  of  1.26  rad  at  r— 1.0.  The  maximum  achievable  percentage 
improvement  is  21  percent,  corresponding  to  an  ideal  filter  with 
rms  phase  error  of  1.0  rad.  The  rms  error  for  the  Viterbi  tracker 
operating  at  1.0  with  A0— « 10  is  1.12.  This  represents  an 
improvement  of  11  percent  over  the  PLL.  The  results  for  Ao—0 
show  that  (as  expected)  the  Viterbi  tracker  is  not  as  good  as  a 
PLL  as  a  zero-lag  filter.  In  Fig.  4  the  heavy  squares  denoted  by 
VT(MAP)  (see  the  symbol  key)  correspond  to  the  smoothing 
variance  achieved  when  the  MAP  sequence  for  a  500  sample  run 
is  used  as  the  phase  estimate.  The  results  arc  averaged  over  40 
such  runs.  The  tabulated  results  in  Fig.  4  summarize  the  perfor¬ 
mance  characteristics  of  many  different  nonlinear  phase  track¬ 
ers.  In  the  figure,  performance  results  for  the  Viterbi  tracker  are 
plotted  just  to  the  left  of  their  true  positions  to  avoid  cluttering 
the  presentation. 

We  hasten  to  emphasize  in  the  interest  of  fair  play  that  all 
results  presented  here  for  nonzero  k0  are  in  reality  smoothing 
solutions.  Such  solutions  are  expected  to  deliver  the  usual 
smoothing  gains  over  filtering  solutions.  This  does  not  detract 
from  the  Viterbi  tracker  as  an  attractive  alternative  in  those 
applications  where  a  short  delay  may  be  accepted  in  exchange 
for  1-2  dB  performance  gains. 


VIII.  Performance  Results 

The  phase  space  [  -  w,w)  has  been  divided  into  A/  —11  equally 
spaced  points  and  the  Viterbi  algorithm  for  phase  tracking 
implemented  as  outlined  in  Section  VII.  The  crucial  conditional 
probabilities  p(4*  •■€*/♦» -i“&)  have  been  computed  as  out¬ 
lined  in  Section  III  and  stored  in  an  M  vector  for  cyclic  reading. 
Random  phase  trajectories  and  measurement  variables  have 
been  generated  according  to  (8)  and  (I).  The  results  of  several 
Monte-Carlo  simulations  are  presented  in  Fig.  4.  Each  Monte- 
Carlo  result  has  been  obtained  by  running  the  Viterbi  phase 
tracker  (and  the  PLL)  over  40  different  trajectories,  each  trajec¬ 
tory  beginning  with  a  uniformly  distributed  phase  variable  at 
k  -  I  and  continuing  for  500  points.  Various  values  of  depth 
parameter  A0  have  been  used,  as  indicated  in  the  figure.  (See  {8| 
for  a  discussion  of  corresponding  statistical  sampling  errors  and 
|  IK)  f < »r  additional  Monle-t  '.trio  results  ) 


IX.  Conclusions 

We  have  derived  a  Viterbi  algorithm  for  obtaining  approxi¬ 
mate  MAP  phase  sequence  estimates  on  { — sr,  w).  The  algorithm 
is  simple  and  fast  by  nonlinear  filtering  standards  and  ideally 
organized  for  hardware  implementation.  More  dramatic  perfor¬ 
mance  gains  than  those  illustrated  in  Fig.  4  may  be  achieved 
when  phase  fluctuations  are  severe,  i.e„  when  oj/o^»0.0l.  The 
reader  is  referred  to  [19)  for  applications  of  these  results  to  phase 
coherent  data  communication. 
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Abstract 

The  techniques  of  dynamic  programming  have  found  a  variety 
ot  successful  applications  in  signal  and  system  theory.  In  this 
paper  we  show  how  two  knotty  nonlinear  filtering  problems — phase 
and  frequency  tracking — may  be  formulated  and  solved  as  forward 
dynamic  programming  problems.  The  resulting  solutions  are  fixed 
interval  smooths  in  which  a  most  likely  sequence  is  passed 
i h rough  a  data  record. 

1 .  INTRODUCTION 

Phase  and  frequency  tracking  problems  comprise  some  of  the 
most  nettlesome  nonlinear  filtering  problems  in  the  realm  of 
signal  processing.  These  problems  have  held  the  interest  of 
control  and  communication  theorists  at  least  since  1953/54  when 
Cohan  and  Parks  [1]  and  Youla  [2)  published  their  work  on  maxi¬ 
mum  likelihood  and  optimum  demodulation  on  an  interval.  Over 
the  years  Cox  [3],  Viterbi  |4],  Calm  |5),  Forney  [6],  and  a  host 
of  others  have  advocated  dynamic  programming  for  the  solution  of 
nonlinear  filtering  problems.  This  Is  a  paper  in  the  same  tra- 
d i t  ion. 

In  this  paper  we  discuss  lorward  dynamic  programming  as  a 
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I  requencv  modulated  sequence  In  puss  thiou>e  .1  dull  i.rl  .  the  key 
idea  is  to  poso  .1  Markov  chain  model  on  I  In-  ciicle  |(),,’  .)  lor 
plioso  oi  frequency,  and  tlion  m'lH'r.ilr  rand  I  dal  MAI’  sii|iii'iii'i'»  that 
arr  consistent  with  Ihr  data  and  l  hr  piohahilitv  si  ruc- 

lure.  More  details  mav  he  found  in  |/|  \  |K|. 

>.  I'llASK  SEQUENCE  KSTI NATION 

Figures  IS,'  depict  two  classical  phase  estimation  problems: 
constant  phase  estimation  and  random  walk  phase  estimation.  In 
these  figures  and  throughout  the  paper  l  denotes  the 

data  set  and  the  r»^,k*l,l+l,  ,t  are  complex  i.i.d.  N(0,  Ir.v.s, 

A  Markov  transition  density  (or  probability  mass  function)  is 
denoted  p ( * / * ) ;  f ( * , denotes  a  joint  density  function; 

|-1  denotes  integer  part. 


Figure  1:  Constant  Phase  Estimation 
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zk  -  exp(  H)  f  nk  ,  k.  |  I  ,  J . I  I 

ll  Is  a  at ralght forward  exceriso  in  maximum  likelihood  (Ml.)  theory 
1 1.  show 


t 

*  arg  /:  a 
k-1 


\s  shown  In  Figure  la  this  esl  Iraalor  maximizes  the  log-likelihood 

..I  lz. 


•  •  »tj  Is 

arg  max  v.n  f  (z^,. 
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.comet rically ,  the  estimator  is  obtained  by  piecing  measurements 
together,  feather-to-tip,  and  measuring  the  angle  to  the  resulting 
vector.  This  is  illustrated  in  Figure  lb.  The  diagram  in  Fig¬ 
ure  lclllustrates  that  if  each  measurement  is  rotated  through  an 
angle  $,  and  each  rotated  measurement  added  to  the  previous,  the 
result  is  purely  real. 


Problem  #1:  Random  Walk  Phase  (Figure  2) 

Here  the  problem  is  to  map  the  data  set  Z into  a  phase  se¬ 
quence  estimate  [0,211)*  when  measurements  are  gener¬ 
ated  according  to 


zk  -  exp(j*k)  +  nfc  k«l,2,...,t 
p(«k/*k_i>  given 

As  shown  in  Figure  2a,  the  MAP  phase  sequence  maximizes  the  log- 
likelihood  of 

(♦j. •••»♦£>  ■  arg  max  vn  f £ (zJ  , . . .  ,zt . ft> 


Here  f  is  the  joint  density  of  the  measurements  and  the  phase 
sequence.  The  likelihood  f  =  log  f  may  be  written 


t-1 


ii. 


z  -«• 
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+  log  p(*t/*t. j) 
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11  ,  ••lkl  ■■  v .  1 1  hi  *  ■  *  ill  .1  <1  iscrel  set  (say  i|.'./il,i|  <1,  I , .  .  .  ,lj- 1) 

•in  .  .in  i  iiij)  lenient  .i  tlvii.inif  i  |in>|’ i  .nuiiii  nn  al  p,nr  1 1  Inn  nn  |  In*  lattice 
it  ri>;ur>  i’li  In  decode  (lie  MAI’  se<|  nemos.  See  |/|  lor  tie  (.ills  of 
(lie  1 1  uni  i  ( lim  .uni  .1  comparison  of  performance  with  oilier  phase 
i 'si  i  ni.it  i  on  a  I  t*or  i  t  lims  . 

I.  KKKiJIlKNCY  SKlJtlKNCK  KSTIMATION 

Pi  pul  es  1  aiul  4  depict  two  classical  I  roi|tiency  esl  im.it  Ion 
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X(u>)  -  >  z.  e*k'“  (Fourier  transform) 
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V.  shown  In  Figure  in  this  estimator  maximizes  the  log-  1  i ke  1  I hood 

• . .2  1: 

arg  max  f  (z() . zf  ;u>) 
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An  obvious  approximation  strategy,  illustrated  in  Figure  lb , 
is  to  zero-pad  the  measurements  and  estimate  w  as  the  OFT  cell 
where  a  maximum  occurs: 

- j2xqk/Q 

•it  =  arg  max  >.  z.  e  J  , 

ui  “q2»/Q  k“0 

q 

\  "  0  •  k  ’  1 


I'robiem  ml:  Random  Walk  Frequency  Estimation  (Figure  4) 

Here  the  problem  is  to  map  the  data  set  Zt  into  a  frequency 

sequence  estimate  (u». ,tii.  Ic  (0,2x)  ^ when  measurements 
are  generated  according  to 

zR  ■  exp(jstjk^Njk)  +  nR  ,  k-0,l,...,t 

p("’|k/Nj/ul(k/N]-l>  Riven 

As  shown  in  Figure  4a  the  MAP  phase  sequence  maximizes  the  log- 
likelihood  of  (Zg,...,ztl. 


|  o  | .  |t/N])  =  i,rR  mi‘X  ,t  *n  ft(20 . Vw|0|' 

w[ k/N ]  k=0 
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data  bliu'k  til  N  samples.  So  it  .j 
sol  (say  » /<?»€!=•>,  1  ,...,<)-!)  ,  one  1  ^ 


lakes  values  in  a  til  Hereto 
can  implement  a  tlynamio  p  logrammi  up,  a  I  p.or  i  l  Itm  on  the  lattice  of 
Figure  4b  to  tleootle  the  MAI’  sequence .  See  |H|  tor  details. 


4.  CONCLUSIONS 


The  problems  discussed  here  generalize.  'Fhe  basic  Idea  Is 
to  select  states  and  transition  probabilities  to  characterize  an 
underlying  probabilistic  structure,  and  then  to  assign  characters 

f^k  ’“‘k/Nl^ 

(such  as  e  or  e  1  )  to  tlte  states.  The  resulting  sequence 

estimation  algorithms  are  attractive  because  storage  goes  like  Q 
(number  of  states)  and  computations  arc  naturally  parallel. 
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A  Dynamic  Programming  Algorithm  for 
Phase  Estimation  and  Data  Decoding 
on  Random  Phase  Channels 

ODILE  MACCHI,  member,  ieee,  and  LOUIS  L.  SCHARF,  senior  member,  ieee 


Aka  trad— Tb*  problem  of  simultaneously  estimating  phase  and  decod¬ 
ing  data  symbols  from  baseband  date  is  posed.  The  phase  sequence  is 
assumed  to  be  a  random  sequence  on  the  circle,  and  the  symbols  arc 
assumed  to  be  equally  likely  symbols  transmitted  over  a  perfectly  equalized 
channel.  A  dynamic  programming  algorithm  (Viterbi  algorithm)  is  derived 
for  decoding  a  maximum  a  posteriori  (MAP)  phase-symbol  sequence  on  a 
finite  dimensional  phase-symbol  treHs.  A  near  and  interesting  principle  of 
optimality  for  simultaneously  estimating  phase  and  decoding  phase- 
amplitude  coded  symbols  leads  to  an  efficient  two-step  decoding  procetoe 
for  decoding  phase-symbol  sequences.  Simulation  results  for  binary,  8-ary 
phase  shift  keyed  (PSK),  and  16-quadrature  amplitude  shift  keyed  (QASK) 
symbol  sets  transmitted  over  random  wafc  and  sinusoidal  jitter  channels  are 
presented  and  compared  with  results  one  may  obtain  with  a  decision-directed 
algorithm  or  with  the  binary  Viterbi  algorithm  introduced  by  Ungerboeck. 
When  phase  fluctuations  are  severe  and  when  occasional  large  phase 
fluctuations  exist,  MAP  phase-symbol  sequence  decoding  on  circles  is 
superior  to  Ungetboeck’s  technique,  which  in  turn  is  superior  to  decision- 
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I.  Introduction 

PHASE  FLUCTUATIONS  can  significantly  increase 
the  error  probability  for  symbols  transmitted  over  a 
channel  that  may  or  may  not  have  been  equalized.  This  is 
especially  true  for  phase  shift  keyed  (PSK)  and  quadrature 
amplitude  shift  keyed  (QASK)  symboling,  in  which  case 
accurate  phase  discrimination  is  essential  for  symbol  de¬ 
coding.  Even  when  the  receiver  contains  a  decision-directed 
phase-locked  loop  (DDPLL).  performance  loss  in  signal-to- 
noise  ratio  (SNR)  with  respect  to  a  coherent  decoding 
system  can  be  in  the  range  5- 10  dB.  This  fact  is  established 
in  (1)  for  practical  symbol  sets  and  typical  values  of  the 
phase  variance  parameter  and  symbol  error  probability. 

On  telephone  lines,  linear  distortion  and  phase  jitter 
dictate  the  use  of  a  channel  equalizer  and  some  kind  of 
phase  estimator  to  achieve  high  rate,  low  error  probability 
data  transmission.  A  common  approach  to  phase  estima¬ 
tion  and  data  decoding  is  to  use  a  decision-directed  algo¬ 
rithm  in  which  a  phase  estimate  is  updated  on  the  basis  of 
old  phase  estimates  and  old  symbol  decisions.  The  DDPLL 
of  [5]  is  a  first-order  digital  phase-locked  loop  (PLL)  in 
which  the  phase  estimate  is  updated  on  the  basis  of  a  new 
measured  phase  and  an  old  symbol  decision.  In  the  jitter 
equalizer  (JE)  of  (3]  and  [4]  a  complex  gain  is  updated 
according  to  a  simple  decision-directed  stochastic  ap- 
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proximation  algorithm.  The  complex  gain  is  used  to  scale 
and  rotate  the  received  signal,  thereby  correcting  phase 
jitter  and  normalizing  rapid  fading  variations.  Although 
there  is  no  explicit  interest  in  phase  estimation  itself  in  the 
JE,  it  is  possible  to  interpret  the  structure  as  an  adaptive 
gain-phase  correcting  equalizer. 

Both  the  DDPLL  and  the  JE  are  very  simple  to  imple¬ 
ment,  but  apparently  neither  achieves  optimality  with  re¬ 
spect  to  any  statistical  criterion  for  symbol  (or  data)  decod¬ 
ing.  Furthermore,  neither  the  DDPLL  nor  the  JE  is  opti¬ 
mum  for  estimating  and/or  correcting  phase.  Both  are 
zero-lag  phase  estimators  that  cannot  benefit  from  future 
signal  samples.  Therefore,  an  important  question  to  be 
answered  is  whether  or  not  symbol  decoding  can  be  im¬ 
proved  using  a  better  phase  estimator.  The  answer,  based 
on  the  results  of  [1]  and  this  paper,  is  that  significant 
improvements  can  be  realized  when  the  phase  fluctuations 
are  severe  if  one  is  willing  to  pay  the  price  of  an  increased 
computational  burden.  In  practice,  cases  of  severe  phase 
fluctuation  can  occur  in  high  data  rate  PSK  and  QASK 
systems  in  which  the  angular  distance  between  symbols  is 
small. 

In  [1]  Ungerboeck  recognized  the  potential  of  maximum 
a  posteriori  (MAP)  sequence  estimation  for  jointly  estimat¬ 
ing  phase  and  decoding  data  symbols.  A  path  metric  was 
derived  and  its  role  in  a  forward  dynamic  programming 
algorithm  for  obtaining  MAP  phase-symbol  sequences  was 
indicated.  Because  of  the  way  phase  was  modeled  in  [  I],  the 
dynamic  programming  algorithm  could  not  be  solved  di¬ 
rectly.  Ungerboeck  approximated  the  phase  sequence  as  a 
process  that  could  make  discrete  binary  jumps  and  then 
derived  a  dynamic  programming  algorithm  for  decoding 
likely  paths  around  a  developing  most  likely  path.  The 
result  is  a  tree-search  algorithm  which  may  branch  left  or 
right  but  never  go  straight.  He  obtained  performance  re¬ 
sults  that  were  on  the  order  of  3  dB  superior  in  SNR  to  the 
DDPLL  in  a  16-QASK  system,  at  interesting  values  of  the 
phase  variance  parameter.  We  call  the  algorithm  of  [1]  a 
discrete  binary  Viterbi  algorithm  (DBVA).  The  reader  is 
referred  also  to  (51  and  (6)  for  discussions  of  other  subopti- 
mal,  but  computationally  tractable,  algorithms  for  simulta¬ 
neously  estimating  phase  and  decoding  data  symbols. 

In  this  paper  we  observe  that  baseband  data  is  invariant 
to  modulo-2ir  transformations  on  the  phase  sequence.  This 
motivates  us  to  wrap  the  phase  around  the  circle,  so  to 
speak,  and  obtain  folded  probability  models  for  transition 
probabilities  on  the  circle.  When  the  phase  process  is 
normal  random  walk  on  the  circle,  then  the  transition 
probabilities  are  described  by  a  folded  normal  model.  This 
model  has  also  been  used  in  (7]  and  (8).  It  is  then  straight¬ 
forward  to  pose  a  MAP  sequence  estimation  problem  for 
simultaneous  phase  and  symbol  sequence  decoding  as  de¬ 
scribed  in  [8]  and  [9].  The  basic  idea  is  to  discretize  the 
phase  space  { — w,  ir)  to  a  finite  dimensional  grid  and  to  use 
a  dynamic  programming  algorithm  (Viterbi  algorithm)  to 
keep  track  of  surviving  phase-symbol  sequences  that  can 
ultimately  approximate  the  desired  MAP  phase-symbol 
sequence.  The  MAP  phase-symbol  sequence  itself  is  the 
entire  sequence  of  past  phases  and  symbols  that  is  most 


likely,  given  an  entire  sequence  of  recorded  observations.  It 
is  this  use  of  “future”  and  “past”  received  signal  samples 
that  provides  performance  improvement  over  zero-lag 
estimators  such  as  the  DDPLL.  Details  of  the  algorithm 
are  given  in  (8)  and  (9).  For  PSK  and  QASK  symbol  sets  an 
interesting  principle  of  optimality  leads  to  an  efficient 
two-step  decoding  procedure.  With  this  procedure,  compu¬ 
tational  complexity  is  reduced  by  a  factor  greater  than  the 
number  of  admissible  phase  values  per  amplitude  level. 
This  amounts  to  a  factor  of  four  for  the  16-point  QASK 
diagram  that  has  been  recommended  by  CCITT  for  data 
transmission  on  telephone  lines  at  9600  bits/s.  Finally,  in 
order  to  make  the  computation  and  storage  requirements 
tractable  in  the  Viterbi  algorithm,  we  use  it  in  a  fixed  delay 
mode,  as  do  other  authors.  By  appealing  to  known  results 
for  fixed-lag  smoothing  of  linearly  observed  data,  we  are 
able  to  intelligently  choose  the  fixed  delay.  Without  signifi¬ 
cant  performance  loss  we  decode  phase-symbol  pairs  at  a 
depth  constant  of  k0  =  10.  This  obviates  the  need  for  huge 
storage  requirements  for  long  sequences.  With  these  mod¬ 
ifications  the  Viterbi  algorithm  becomes  a  feasible,  albeit 
sophisticated,  decoding  procedure. 

Simulation  results  for  the  proposed  Viterbi  algorithm 
(VA)  are  presented  for  several  symbol  sets  consisting  of 
two,  eight,  or  16  symbols.  Several  types  of  phase  jitter  are 
investigated  such  as  Gaussian  and  non-Gaussian  random 
walk  and  sinusoidal  phase  jitter.  The  resulting  error  proba¬ 
bilities  are  compared  with  those  of  the  simpler  decision- 
directed  algorithms  (JE  and  DDPLL)  and  with  those  of  the 
DBVA.  As  expected,  performance  of  the  VA  is  always 
superior  to  that  of  the  other  systems.  On  the  other  hand, 
the  increase  in  computational  burden  is  substantial,  and 
the  improvement  in  performance  is  not  always  great  enough 
to  warrant  the  use  of  the  VA.  In  our  concluding  remarks 
we  discuss  situations  in  which  one  might  reasonably  use 
the  VA  or  the  DBVA  rather  than  a  simpler  decision-directed 
algorithm  such  as  the  JE  or  the  DDPLL. 

Remarks  on  Notation: 

Throughout  this  paper  11  denotes  statistical  indepen¬ 
dence.  The  notation  {$*}f  will  mean  the  set  {♦*,  k  = 
1,2,-  •  -  ,K}.  When  the  indexes  1  and  K  are  missing  (e.g., 
{$*}),  it  is  understood  that  K  is  infinite.  The  symbol  N  + 
denotes  the  positive  integers.  The  notation  x :  Nx(p,o2) 
means  the  random  variable  x  is  normally  distributed  with 
mean  p  and  variance  o2;  Nx(p,  o2)  will  also  be  used  to 
denote  the  function  (2wcJ)-,/2exp {-(x  -  p)2/2o2}. 
When  x  is  complex,  x :  Nx(p,  o2)  means  x  is  complex  with 
density  Nx(p,o2)  =  (2ire2)“' exp  { -|  x  -  p\2/2o2}.  By 
f(x/y)  we  mean  the  conditional  probability  density  of  the 
random  variable  x,  given  the  random  variable  y.  Thus 
f(x/y)  is  generally  a  different  function  than  f(w/z),  even 
though  we  use  no  explicit  subscripting  such  as /„,,(•/•)  to 
indicate  so.  We  make  no  notational  distinction  between  a 
random  variable  and  its  realizations,  relying  instead  on 
context  to  make  the  meaning  clear.  A  density  function  for 
a  random  variable,  evaluated  at  a  particular  realization  of 
the  random  variable  is  termed  a  likelihood  function. 
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Fig.  2.  Symbol  diagrams  for  PSK  and  QASK  modulation  schemes. 


“Hatted”  variables  such  as  <f>k  refer  always  to  MAP  esti¬ 
mates  that  maximize  an  a  posteriori  density.  Finally,  it  is 
convenient  to  define  the  function 

M  oo 

g*(x)  =  M  _l  2J  2  h[x  —  I2ir  —  (m  —  \)2ir/M] 

m=  I  /=  -oo 

0) 

where  h(  )  is  a  probability  density.  The  function  gM() 
plays  an  important  role  in  our  discussion  of  phase-symbol 
decoding  on  QASK  symbol  sets. 

II.  Signal  and  Phase  Models 

Assume  complex  data  symbols  {a*}  are  phase  or  phase- 
amplitude  modulated  onto  a  carrier  and  transmitted  over  a 
channel  with  linear  distortion  and  phase  jitter.  The  re¬ 
ceived  signal— call  it  y(t)— is  typically  processed  as  il¬ 
lustrated  in  Fig.  1.  The  signal  y(t)  is  passed  through  a 
bandpass  noise  filter  and  demodulated  with  two  quadra¬ 
ture  waveforms.  The  resulting  complex  baseband  signal 
x,(f)  +  jxt(t)  is  equalized  with  a  complex  adaptive 
equalizer  in  order  to  reduce  the  intersymbol  interference 
due  to  linear  distortion  in  the  channel.  The  equalized  signal 
is  a  sequence  of  samples  at  symbol  rate  1/A  (A  is  the 
interval  between  successive  data  symbols).  The  output  of 
the  equalizer  is  a  complex  sequence  xk  =  x^  +  jx i2)  which 
is  a  noisy,  phase-distorted,  version  of  the  original  trans¬ 
mitted  sequence.  Thus  we  write 

xk  =ake»'  +  nk,  kGN+.  (2) 

Here,  (a*}  is  the  complex  symbol  sequence,  typically  en¬ 
coded  according  to  one  of  the  diagrams  illustrated  in  Fig. 
2.  The  sequence  {$*}  represents  phase  fluctuations  (jitter 
and  frequency  drift)  in  the  channel.  The  two  real  compo¬ 
nents  And  ni2)  of  the  complex  noise  sequence  nk  = 


+  jn^  are  the  noise  variables  in  the  respective  baseband 
quadrature  equalized  channels.  The  variables  and  n^2> 
can  be  shown  to  be  independent  when  the  carrier  frequency 
is  in  the  middle  of  the  input  noise  filter  bandwidth  and  the 
additive  channel  noise  is  white.  If  the  equalizer  is  perfect, 
then  nk  is  the  usual  Gaussian,  additive  noise  with  zero- 
mean.  If  the  equalizer  is  not  perfect,  then  nk  contains  a 
residual  of  the  intersymbol  interferences,  and  is  not  Gaus¬ 
sian;  nor  are  successive  variables  njtl),  /$}.„•  •  •,  indepen¬ 
dent.  However,  for  a  reasonably  good  equalizer,  we  may 
assume  that  {«*}  is  a  sequence  of  independent  identically 
distributed  (i.i.d.)  complex  Gaussian  variables.  Strictly 
speaking,  this  assumption  is  valid  only  at  the  input  to  the 
equalizer  when  the  baseband  equivalent  of  the  input  noise 
filter  and  low-pass  demodulator  is  the  so-called  sampled 
whitened  matched  filter  of  [10].  In  practice,  the  assumption 
of  Gaussianity  is  more  realistic  than  the  assumption  of 
independence  for  the  sequence  {n*}.  Assuming  that  the 
equalizer  of  Fig.  1  is  perfect,  we  model  the  noise  sequence 
{nk)  as  follows: 

nk  =  nV>  +  j k  G  N  + 

nV±Ln?\  V(*,/) 

njtl)-LLrtJl),  I#/,  niJ>J_LnP\  *#/  U 

0,o.2);  n?:Nmt{0,om2). 

Here  2a}  is  the  variance  of  the  complex  noise  variable  nk, 
and  a,2  is  the  variance  of  each  real  component. 

Consider  now  the  phase  distortion  {$*}.  The  term  gener¬ 
ally  reflects  two  effects,  one  long-term  and  the  other  short¬ 
term.  In  modem  high  speed  data  modems  no  carrier  or 
pilot  tone  is  transmitted  for  locking  the  local  oscillator  at 
the  receiver.  Thus  long-term  large-range  linear  phase  varia¬ 
tions  result  from  frequency  drift  in  the  channel  which 
cannot  be  eliminated.  In  addition,  nonlinear  intermodula- 
tion  with  local  power  supplies  gives  rise  to  short-term 
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Fig.  3.  Typical  phase  fluctuations:  phase  jitter  and  frequency  drift. 


small-range  phase  variations.  The  variations  exhibit  en¬ 
ergetic  harmonic  content  at  the  harmonics  of  the  funda¬ 
mental  power  supply  frequency.  Hence  a  realistic  model 
for  {<{>*}  is 

P 

<t>k  =  +  2mBk)  +2!  /1/Sin (2ffP;A:A+  p,).  k  £  N  + 

/=  i 

(4) 

where  p,  —  I  -50  Hz  or  vt  =  l  ■  60  Hz,  depending  on  the 
place  of  use.  A  typical  phase  process  is  depicted  in  Fig.  3. 
The  first  term  in  parentheses  in  (4)  is  the  so-called  frequency 
drift  term  and  the  summation  term  is  the  phase  jitter.  In 
practice,  the  constants  B,  { Ah  pt,  p,)/’  ,  vary  with  time 
fcA  but  at  an  extremely  slow  rate. 

The  spectrum  of  the  phase  jitter,  i.e.,  the  behavior  of  A, 
versus  vh  has  been  investigated  experimentally  in  (14).  The 
spectrum  is  roughly  fitted  by  a  1  /p2  curve.  A  phenomeno¬ 
logical  model  for  phase  having  a  l/f!  spectrum  (like  that 
of  phase  jitter  at  high  frequencies)  is  the  Wiener- Levy 
continuous  time  process. 


dt 


w(r). 


r  >0, 


(5) 


where  {w(r)}  is  a  white  noise  process.  The  discrete  time 
analog  is  the  independent  increments  sequence 


<t>k  =  I  +  *k  •  k  e  N  *  (6) 


where  { wk }  is  a  sequence  of  i.i.d.  random  variables  with 
even  probability  density  h(w).'  When  wk :  Nh{ 0,  a2),  then 
(<>*}  is  the  so-called  normal  random  walk. 

For  short-term  fluctuations,  the  model  captures,  with 
appropriate  selection  of  h(w),  the  correlated  evolution  of 
phase.  The  main  virture  of  the  independent  increments 
model  is  that  it  forms  a  convenient  basis  from  which  to 
derive  optimum  estimator  structures  which  may  then  be 
evaluated  against  more  realistic  phase  sequences. 

Since  the  measurement  model  of  (2)  is  invariant  to 
moduIo-2ir  translates  of  <f>k,  we  may  represent  phase  as  if  it 
were  a  random  sequence  on  the  unit  circle  C  or  equiva¬ 
lently  on  the  interval  {  —  ir,  ir).  Call  <t>k  this  representation 
of  <t>k.  Note  +  ,  may  be  written 


*a  +  i  =«*+•*’* 


(7) 


where  the  plus  sign  denotes  modulo-2ir  addition  of  real 
variables  or  equivalently  rotation  with  positive  (counter¬ 
clockwise)  sense  on  C.  The  variable  wk  is  a  modulo-277 
version  of  wk. 

The  conditional  density  of  $*  +  1  +  »>k,  given  <f>4,  is 

*(♦*+1  ~  ♦*)•  Since  <t>kt,f  is  a  modulo-2ir  version  of  <j>k  +  i. 


we  may  reflect  all  of  the  conditional  probability  mass  into 
C  to  obtain  the  transition  (or  conditional)  probability 
density 

00 

AW**)  =  2  +  i  —  4>t(  —  I2v) 

I—  ~  OO 

=  £i(^*+i_^a)  (8) 

where  g ,  is  the  function  defined  in  (1).  Hereafter,  g,()  is 
called  the  folded  density  of  the  phase  increments.  Usually, 
the  phase  increment  is  small  and  its  distribution  h(-)  is 
very  narrow  with  respect  to  2tt.  Therefore,  in  the  sum  of 
(8)  only  one  term  is  relevant  and  f(<t>k  +  l/<t>k)  =  *(<£*+,  ~ 
<t>k).  In  the  normal  case,  this  implies  o„  <  277,  where  o2  is 
the  variance  of  wk  ,_As  it  is  cumbersome  to  carry  around  the 
overbar  notation  <j>k  +  ,  —  4>k,  we  drop  it  with  the  caution 
that  from  here  on  <(>k  is  defined  on  C  unless  otherwise 
stated. 

In  the  normal  case  [7],  [8],  the  density  g^Att  ~  <t>k) 
may  be  written 

Si(<>*  + 1  -<#>*)  =  2  N*k.t(*k  +/2”<°Z)-  (9) 

l—  —  OO 

This  case  and  the  Cauchy  case  (in  which  the  distribution 
tails  are  much  heavier  than  the  normal  tails)  are  studied  in 
the  Appendix.  It  is  shown  that  g,(x)  achieves  its  maximum 
at  x  =  0  and  that  it  is  monotonically  decreasing  on0<x 
<  ir. 

The  sequence  {<^}f  is  Markov.  Therefore,  we  may  write 
for  the  joint  density  of  the  K  phases 

/({4>*}i)  —  /(<>*  + ,/<>*) 

/({*,/<>„})  -  f(<t>,)  ■  the  marginal  density  of  <f>,. 

(10) 

Usually.  <J>,  is  uniformly  distributed  on  C  because  phase 
acquisition  starts  at  k  =  1  with  no  prior  information  about 
its  value.  By  the  independence  of  the  nk  in  (2),  it  follows 
that  the  conditional  density  of  the  measurement  sequence 
{■**}?'  given  the  phase  and  data  sequences  {<M*.  (a*}*<  *s 

/({**}?/  {**}?.  fo}f)  =  n  NXt(ake^,o2). 

(11) 

Equations  (8)— ( 1 1)  form  the  basis  for  the  derivation  of  a 
MAP  sequence  estimator.  The  key  element  is  that  (<pk)  is  - 
Markov  sequence  with  a  bounded  range  space  (-ir, ir). 
Discretization  of  this  bounded  interval  leads  to  a  finite-state 
model  from  which  a  finite  dimensional  dynamic  program¬ 
ming  algorithm  can  be  derived. 

III.  Decision-Directed  Algorithms 

The  usual  way  of  dealing  with  phase  fluctuations  is  to 
design  a  phase  estimator  and  use  the  estimated  phase,  call 
it  to  rotate  the  received  signal  as  follows: 

yk  =  xke»',  k&N*.  (12) 

The  phase  corrected  signal  yk  is  then  fed  to  a  decision 
device  which,  in  tum,  delivers  the  symbol  estimate  dk. 


That  is.  h(w)  -  h(-w). 
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Typically,  the  phase  estimate  <f>k  is  functionally  dependent 
on  the  old  measurements  {  •  •  -,xk  _2,  xk  l)  and  the  past 
symbol  estimates  {  •  •  •,  aA_2.  a4_ ,}.  If  a  carrier  or  pilot 
tone  is  transmitted  as  in  typical  single  sideband  (SSB) 
systems,  then  $k  is  obtained  from  a  simple  phase-locked 
loop  (PLL).  In  suppressed  carrier  systems  such  as  PSK  or 
QASK  systems,  the  PLL  is  decision-directed.  That  is.  4>k  is 
updated  on  the  basis  of  ak  For  instance  in  [5] 

<>*  +  ■  =♦*  +  Mlm[x*a*e'*‘] 

-  4>k  +  M* sin  (arg  xk  -  arg ak  -  $*), 

M*=mK|2  (13) 

where  the  asterisk  denotes  complex  conjugate  and  p  is  a 
constant  that  depends  on  the  SNR.  The  estimator  of  (13)  is 
called  a  DDPLL. 

In  the  jitter  equalizer  (JE)  of  [3]  and  [4],  xk  is  rotated  and 
scaled  as  follows: 

yk  =  xfik 

Gk  =  G*-i  +  #*("*-!  ~Vk  i)**  ,•  (>4) 


(a)  <b) 

Fig.  4.  Geometry  of  phase  jitter  and  additive  noise  with  and  without 
phase  correction  of  DDPLL.  (a)  Without,  (b)  With. 

IV.  MAP  Phase  and  Symbol  Sequence 
Decoding  with  the  Viterbi  Algorithm 

The  basic  idea  behind  MAP  sequence  decoding  is  to  find 
a  sequence  of  phase-symbol  pairs  {<)>*,  a*}*  that,  based  on 
the  observation  sequence  {x*}*,  appears  most  likely.  The 
application  of  this  idea  to  data  communication  was  first 
proposed  in  [1]  and  refined  in  [9].  The  most  likely  se¬ 
quence,  call  it  a*},  is  the  sequence  that  maximizes  the 
natural  logarithm  (or  any  other  monotone  function)  of  the 
a  posteriori  density  of  {<j>k,  ak }f,  given  the  sequence  of 
observations  {xk  }*.  Thus  we  pose  the  maximization  prob¬ 
lem: 


The  complex  gain  Gk  is  the  single  complex  coefficient  of  a 
one-coefficient  rapidly  adaptive  equalize/.  We  may  think  of 
Gk/\  Gk  |  as  the  phase  correction  e  and  |  Gk  |  as  a 
gain  correction  ck.  Thus,  although  there  is  no  explicit 
formulation  of  a  phase-gain  estimation  problem  in  [3]  and 
[4j,  the  net  effect  of  the  JE  is  to  correct  phase  and  normal¬ 
ize  rapid  fading  variations.  As  explained  in  (4],  when  phase 
fluctuations  are  large,  the  JE  performance  may  be  im¬ 
proved  by  setting  a  constraint  on  Gk  that  keeps  its  value 
inside  a  given  domain  including  the  complex  point  (0.  1). 

Geometrical  Comments 

The  combined  effects  of  random  phase  fluctuations  and 
additive  noise  may  be  illustrated  as  in  Fig.  4(a).  The 
transmitted  symbol  ak  —  a{0)  (say)  is  rotated  by  the  ran¬ 
dom  phase  angle  <t>k  to  give  akeJ*‘.  To  this  is  added  the 
complex  noise  sample  nk  to  give  the  measurement  xk 
defined  in  (2).  For  the  case  illustrated,  the  resultant  mea¬ 
surement  is  closer  to  symbol  a* 11  than  to  a(0’  and  conse¬ 
quently,  with  no  phase  or  phase-gain  correction,  a  decod¬ 
ing  error  would  be  made.  To  emphasize  the  combined 
effects  of  phase  fluctuation  and  additive  noise,  we  have 
illustrated  a  case  for  which  either  phase  jitter  or  additive 
noise  alone  would  cause  no  error.  See  [1 1]  for  a  probabilis¬ 
tic  discussion  of  this  issue.  Fig.  4(b)  is  an  illustration  of 
how  a  DDPLL  works.  The  angle  <pk 's  the  noisy  measured 
phase  (arg  xk )  minus  the  sum  of  the  phase  of  the  decoded 
symbol  and  the  previously  estimated  phase  (arg  dk  +  4>k ). 
A  given  amount  pk  of  this  angle  is  added  to  <pk  as  a 
correction  to  get  the  new  phase  estimate  4>k  + ,  =  4>k  + 
pk4'k'  Note  that  only  phase  is  corrected.  In  the  JE  both 
phase  and  gain  are  corrected,  offering  potential  for  im¬ 
proved  performance.  This  potential  is  particularly  im¬ 
portant  in  QASK  symbol  sets  where  amplitude  errors  in  xk 
can  result  in  decoding  errors. 


max  In /({**}?,  {ak)KJ  {.**}?).  (15) 

(♦»)).(«*}* 

This  is  equivalent  to  maximizing  the  natural  logarithm  of 
the  likelihood  function /({x*}f.  {<)>*}f.  W}f  ).  obtained  by 
evaluating  the  joint  density  function  for  {x*}*.  (<Mf.  and 
{a*  )f,  at  the  observed  values  of  {x*)f.  Using  the  results  of 
(10)  and  (11)  we  may  write 

/({**)?.  KJf) 


II  Nu{(1ke",’i'°n)f(‘l>k/'t'k  l) 


A  =  1 


/({«*}?)•  (16) 


Assuming  the  to  be  a  sequence  of  independent, 

equally  likely  symbols,  using  (8),  and  neglecting  irrelevant 
constants,  we  may  write  the  maximization  problem  as 


max  1 K 

{♦*  If.  («*>*' 


4?,'“ 


ake 


j+k  |2 


+  2  lnSi(<#>A  “  4>*  i)  +  (17) 

k  =  2 

Note  that  satisfies  the  recursion 

r*  =  r*_,  +pk.  k  =  2.3,--- 

Pk  =  ~~i\xk  ~  akeJ*‘  |2  +  In g,(«>*  -<*>*-,). 

k  =  2,3,  ••• 


r,  = 


i 

2  a,2 


I  x,  -  |2  +  In  /(<#>, ) 


(18) 


where  pk  is  the  so-called  path-metric.  For  convenience,  let 
us  make  explicit  in  I\  the  last  phase  and  symbol: 
r*(<fk*.  aK).  The  other  arguments  {a*}*-1,  re- 
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main  implicit.  Then,  from  (18) 

!*(**>  °k)  —  fltf-i) 

+Pk(xK’  aK'  $K<  *X-l)'  0®) 

Thus  the  maximizing  sequence— call  it  ({**}?,  {d*}*)— 
passing  through  (**_,,  aK-i) on  >ts  way  10  (*x>  dK),  must 
arrive  at  (**_,,  d*_,)  along  a  route  ({**}f-2,  {d*}*-2) 
that  maximizes  !*_,(**_ ,,  d*-!).  It  is  this  observation 
which  forms  the  basis  of  forward  dynamic  programming. 
In  the  actual  implementation  of  a  dynamic  programming 
algorithm,  one  must  discretize  the  phase  space  C  to  a  finite 
dimensional  grid  of  phase  values  a  =  {£„}”=  |.  The  func¬ 
tion  In  g,(<t>k  —  **_,)  is  then  defined  on  the  two- 
dimensional  grid  a  X  E.  However,  as  discussed  in  (8]  and 
[9],  the  resulting  m  X  m  matrix  of  conditional  probabilities 
has  Toeplitz  symmetry  which  means  only  an  m  vector  of 
conditional  probabilities  must  be  computed  and  stored. 

The  Viterbi  algorithm  for  simultaneous  phase  and  sym¬ 
bol  decoding  consists  simply  of  an  algorithm  which  de¬ 
termines  survivor  phase-symbol  sequences  terminating  at 
each  possible  phase-symbol  pair.  One  of  these  surviving 
sequences  is  ultimately  decoded  as  the  approximate  MAP 
phase-symbol  sequence.  The  complexity  c  of  the  algorithm 
lies  mainly  in  the  evaluation  of  the  mM  possible  values  of 
|  xk  -  akeJ*‘  |\  for  each  new  measurement  xk.  Here  M  is 
the  symboling  alphabet  size,  and  m  is  the  number  of  dis¬ 
crete  phase  values.  For  each  calculation  of  |  xk  —  akeJ*i  |2 
there  are  six  real  multiplies.  Compared  to  this  multiplica¬ 
tion  load  of  6mM  per  sample,  the  determination  and 
addition  of  the  m  possible  values  of  In  %,(<t>k  -  **_,)  that 
appear  in  (18)  is  negligible.  The  determination  of  |x*  - 
ake'*t  |2  would  likely  be  computed  in  a  pipelined  parallel 
architecture,  while  the  terms  In  g,(  • )  would  be  read  by 
appropriately  addressing  read  only  memory  (ROM).  When 
short-term  phase  fluctuations  have  small  amplitude  (ow 
small)  so  that  m  must  be  large  for  accurate  phase  tracking, 
the  complexity  increases.  For  example,  with  M  =  8  and 
m  =  48,  c  ~  384,  indicating  on  the  order  of  2  X  10 3  com¬ 
putations  at  each  fc-step. 

As  we  show  in  the  next  section,  the  complexity  of  the 
Viterbi  algorithm  can  be  dramatically  reduced  by  making  a 
change  of  variable  and  tracking  a  total  phase  variable  that 
is  the  sum  of  <f>*  and  the  symbol  phase,  arg  ak.  Also,  of 
course,  for  PSK  symbol  sets  only  one  symbol  amplitude  is 
admissible,  and  admissible  symbol  phases  may  be  chosen 
to  fall  on  one  of  the  discrete  phase  values.  Thus  for  PSK 
symbol  sets  the  complexity  is  simply  m,  and  the  number  of 
path  metric  computations  is  on  the  order  of  300  for  m  =  48. 
Even  this  figure  may  be  reduced  by  using  one  of  a  variety 
of  so-called  M  algorithms  in  which  all  surviving  phase- 
symbol  pairs  are  saved,  but  only  a  handful  of  candidate 
originator  pairs  are  considered  for  each  survivor  (16]-[18). 

V.  A  Principle  of  Optimality  for  Phase-Amplitude 
Coded  Symbols  and  an  Efficient  Two-Step 
Decoding  Procedure 

In  order  to  simplify  matters  and  to  illustrate  the  key 
ideas,  let  us  consider  PSK  symbols  of  the  form 

ak  =  eJt *  (20) 


with  { 8k )  drawn  independently  from  an  M- ary  equiproba- 
ble  alphabet  ©={(/-  l)2ir/Af}^,.  Write  the  measure¬ 
ment  model  of  (2)  as 

x*  =  e*‘  +  nk  (21) 

where  the  total  phase  ik  is  represented  as 

**=**  +  ek 

ek  =  l  a ek  =ek-  **_„  a*,  =  (22) 

/=  1 

It  is  clear  that  6k  =  2*=  ,A^  and  <j>k  =  tpk  —  6k.  Thus  we 
may  replace  the  MAP  sequence  estimation  problem  posed 
in  (15)  by  the  problem 

max  {**}?,  (A0*)?)-  (23) 

The  joint  density  /*-/(-,-,•)  in  (23)  may  be  written 

fK  =  J  NXk{e»>,o*)f(+k,  Mk/ {*,}?"',  {A*,}*-') 

(24) 

where  for  k-  1,  /(*,,  A0,/-,  •)  is  simply  the  marginal 
density  /( ^, ,  Ad,).  The  conditional  density  on  the  right- 
hand  side  of  (24)  is  easily  evaluated  with  Bayes’  rule: 

/(**,***/{*>}?■'.  (^K-')  =/(**/ 

•/(^/{+y)J",,(^)!")-  (25> 

Now  A®*  is  independent  of  the  previous  data,  additive 
noise  and  phase  fluctuations.  Thus 

/K/{+y}r,4M,}r,)=jj?.  (26) 

Moreover,  if  we  rewrite  as 

=  *<t-i  +  wk  +  0*-i  +  ek  ~  ek- 1 

=  V'a-  t  +  A0*  +  wk,  (27) 

we  see  immediately  that 

/(+*/ {*>}?"'•  W?)  =*■(**  -**->  -*•*)• 

(28) 

Recall  <pk  is  defined  on  the  circle  C.  Therefore,  for  clarity 
we  might  think  of  **  as  a  random  variable 
**- 1  +  A0*  +  wk,  whose  density  is  folded  in  [— w,  v). 
Putting  (24)-(28)  together,  we  have  for  the  joint  density/* 

fK  =  J  ~  **-,  -  A 6k) 

A*,  =*„  *o=0.  (29) 

Principle  of  Optimality 

Call  (**}f,  (Atf*)*  the  MAP  sequences  that  maximize 
/*;  (A enters  only  in  the  g,(-)  term  on  the  right-hand 
side  of  (29).  Now  let  us  suppose  (as  is  usual)  that  g,(w), 
which  is  even,  is  also  unimodal  with  a  peak  at  w  =  0.  This 
single-mode  assumption  for  g,(-)  is  valid  in  particular 
when  the  phase  increment  wk  in  the  Markov  process  (6)  has 
a  Gaussian  or  Cauchy  distribution  h(w)  (see  the  Appen- 
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(a) 


M*4 

(C) 


g , ( R  (  x»  Wroppad  on 
lh«  Circl*  C.  M  •  4 


<d) 


Fig.  5.  Density  functions  of  phase  increment  before  and  after  folding. 


dix).  It  follows  that  fK  is  maximized  by  choosing 

(30) 

where  [jc]  denotes  the  closest  value  of  (/  —  \)2it/M  to  x. 
By  substitution  of  the  constraint  (30)  into  (29)  and  defining 
the  “rest”  function  R(x)  on  the  circle  C  by 

R(x)  =  *-[*],  (31) 

we  find  that  one  must  maximize 

fK  =  n  K„(eJ*^an)jfg\(R('llk  1))-  (32) 

The  maximization  of  }K  with  respect  of  is  formally 
equivalent  to  maximizing  the  joint  density /({xk}f,  ) 

when  the  total  phase  \pk  follows  a  Markov-model  similar  to 
(6): 

+k  =4>k-i  +  “*■  (33) 

Here  the  independent  increments  uk  have  “probability 
density,”  folded  on  the  circle  C, 

/(«)  =  ^g,(*(«)).  (34) 

This  interpretation  is  purely  formal  since  f(u)  is  not 
generally  a  probability  density.  However,  when 

*.<«)  =  0,  (35) 

then  f(u)  is  a  probability  density  because  in  that  case 

^gi(^(«))  =««(«)•  (36) 

Thus  (34)  can  be  interpreted  as  an  approximate  density 
when  the  peak  of  g(u)  is  narrower  than  the  minimum 
phase  distance  between  the  symbols.  This  condition  is 
always  satisfied  in  communications  applications;  other¬ 
wise,  phase  distortion  is  so  large  that  data  transmission  is 
not  possible.  Thus  we  have  a  pure  phase-tracking  problem 
as  in  (8]  and  [9],  and  we  may  proceed  accordingly.  Taking 


the  natural  logarithm  of  /*,  we  have  the  maximization 
problem 

maxTjJ., 

(<Mf 

r*  =  r*  - 1  +  Pk ; 

n  =  -  ij  +  in  *,(*(*,)) 

2<V 

Pk  =  ~  I  xk  ~  *’*'  |2  +  hi  g,[/t(**  -  , )]  (37) 

which  is  solved  by  the  dynamic  programming  algorithm 
discussed  in  Section  IV.  The  complexity  c'  of  this  algo¬ 
rithm  lies  essentially  in  the  evaluation  of  the  m  possible 
values  of  |  xk  -  eJ*k  |2  for  each  new  data  value  xk.  The  m 
different  values  of  In  g,  [/((-)]  will  be  precomputed  and 
stored  in  ROM.  For  each  computation  of  |  xk  -  ej*k  |2 
there  are  two  multiplies,  so  complexity  is  simply  propor¬ 
tional  to  m.  This  represents  a  reduction  in  complexity 
greater  than  M  for  Af-ary  PSK. 

Usually,  the  phase  is  differentially  modulated  rather 
than  directly  modulated,  and  therefore  the  relevant  symbol 
is  A$k  itself  (see  (30)).  For  the  purpose  of  data  transmission 
there  is  no  need  to  reconstruct  the  absolute  data  phase 
6k  =  2f=|A^,.  This  reconstruction  has,  however,  been  car¬ 
ried  out  in  the  simulations  in  order  to  recover  the  estimates 
Qk  ='bk  ~  °f  the  phase  fluctuations  and  to  get  the 
approximate  variance  of  the  phase  estimates 

2  !♦*  (39) 

*  *=i 

Density  Functions  and  Geometrical  Comments 

The  entire  development  of  this  section  has  a  nice  geo¬ 
metric  interpretation  which  we  illustrate  in  Fig.  5.  In  Fig. 
S(a)  the  basic  phase  noise  density  A(x)  is  illustrated  on 
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(-oo,  oo).  Fig.  5(b)  is  the  folded  version  g,(x)  of  h(x)  to 
account  for  the  wrapping  on  the  unit  circle  C.  Fig.  5(c)  is 
the  function  g,[R(.x)]  that  arises  in  our  discussion  of  the 
principle  of  optimality,  sketched  in  the  case  of  4-ary  phase 
modulation.  Fig.  5(d)  shows  g,[R(.x)]  wrapped  around  the 
circle  C.  Since  g,(x)  is  very  narrow,  g,[ /?( jc)]  is  approxi¬ 
mately  the  repeated  copy  of  g:,(x)  at  all  possible  values  of 
data  phase.  With  x  =  ik  ~  <fk-,.  Fig.  5(d)  illustrates  the 
choice  of  A0k  nearest  ifk  -  t£k  _ ,  ( A0k  =  w/2  /s  the  best 
choice  here),  and  the  resulting  value  of  g,(/?(^  -  <£*_,)) 
is  shown  by  the  heavy  segment  on  the  axis  \pk,  terminated 
by  the  heavy  dot. 

We  now  extend  this  principle  of  optimality  to  phase- 
amplitude  encoded  symbols.  Assume  the  independent, 
equally  probable  data  symbols  are  complex  symbols  of  the 
form 

ak  =  Ake,tk  (40) 

with  the  Ak  positive  real  numbers  drawn  independently 
from  the  alphabet  A  =  (a,,a2,-  •  -,at).  Denote  by  p(Ak) 
the  probability  mass  function  for  the  random  variable  Ak. 
Assume  the  6k  are  drawn  from  the  alphabet  B  = 
(/),,  p2,- ■  -  ,j8M  ).  Denote  the  conditional  probability  mass 
function  of  0k%  given  Ak,  by  p(6k/Ak).  For  the  (4,  4) 
diagram  of  Fig.  2(d), 

A  =  (v 2  a,,  3a,,  3/2  a,,  5a,); 

The  probabilistic  description  of  the  source  is 

p(Ak)=  1/4,  for  all  Ak 


p(0k/Ak  =  a,)  =  |^/4' 
P(0k/Ak  =  «j)  =  P(ek/Ak 

P(°k/Ak  =  «:)  =  |q/4' 


0k  ~  Pi'  P*'  Pb'  P» 

otherwise 

=  «i) 

&k  ~  Pi'  Pi'  Pi'  Pi 

otherwise 


P(0k/Ak  =  «*)  =  P(0k/Ak  =  « 2)-  (41) 

In  place  of  the  maximization  problem  posed  in  (23),  we 
write 


max 


(42) 


with  <pk  and  A0k  defined  as  in  (22).  The  density  /*(•,•,•,•) 
appearing  in  (42)  may  be  written 


/K  =  n*jAke'*‘,on2) 


■/(*k.A0k,Ak/ {Afy*  ',  {A,}*  ').  (43) 

The  conditional  density  on  the  right-hand  side  of  (43)  is 
simply 

/Uk,A0k,Ak/-,',) 

=  g,(tk  -+*-1  -  A0k)p(A0k/Ak,Ak_,)p(Ak)  (44) 


where p(A0k /Ak,  A k.,)  is  the  conditional  probability  mass 
function  for  A0k,  given  Ak  and  Ak_ ,.  Putting  (43)  and  (44) 
together,  we  have  as  the  joint  density  function  to  be 
maximized 

K 

fK=  n  NXt(Ake»',o:)gl(+k  -*k-i  -Mk) 

k  —  1 

■p(A0k/Ak,  Ak_i)p(Ak).  (45) 

It  is  important  to  note  in  this  expression  that  the  NXi(  • ,  ■ ) 
term  is  dependent  only  on  the  measurement  model;  g,(  • )  is 
dependent  only  on  the  random  phase  model,  and 
p(A0k/-,  )p(Ak)  is  dependent  only  upon  the  symboling 
constellation  (or  encoding  scheme).  Thus  (45)  is  a  useful 
canonical  decomposition  that  is  generally  applicable  to 
communications  problems  involving  additive  independent 
noise  and  independent  increments  phase  processes. 

For  the  (4,  4)  diagram  of  Fig.  2(d)  we  may  compute 
p(A0k/AkAk_t)  as  follows: 

p(Mk/Ak  =  <*,'  Ak- 1  =  «,) 

1/4,  A0k  =  P\,Py.  Ps'P-l' 

/,  j  even-even  or  odd-odd  .  . 

~  1/4,  A0*  (  ’ 

i,  j  even-odd  or  odd-even 

It  is  a  straightforward  matter  to  substitute  these  results  into 
(45)  and  derive  a  path  metric  as  in  (37). 

VI.  Linear  Performance  Results  and  the 
Selection  of  a  Fixed  Lag 

There  is  one  more  simplification  to  be  made;  namely,  the 
selection  of  a  depth  constant  k0  such  that  phase-symbol 
pairs  may  be  decoded  at  a  fixed-lag  k0,  thereby  obviating 
the  need  to  store  long  survivor  sequences.  Call  (i/'*/*}?  the 
MAP  phase  sequence  based  on  measurements  (xt}f.  The 
subscript  k/K  indicates  that  \fk/K  depends  on  all  measure¬ 
ments  up  to  time  K.  In  general  the  MAP  sequence 
{4>k/K+  ,}f 4 1  based  on  measurements  to  time  ( K  4-  1)  may 
differ  from  {>£*/*}*  at  all  values  of  1  S  k  K.  However, 
one  expects  that  for  large  K  and  for  k  S  K  —  k0,  the 
sequences  {>)',/*)*  and  {>£//*+ 1}*  will  not  be  very  different 
for  a  well-chosen  depth  k0.  In  other  words,  long  survivor 
sequences  tend  to  have  one  common  trunk  up  to  K  —  k0, 
at  which  point  they  may  diverge  as  illustrated  in  Fig.  6. 
Thus  we  may  use  ^*_*o/*  as  a  final  estimate  of  since 
ix-ko/K+i  ^  ix-ko/K  f°r  positive  /.  Thus  as  a  practical 
matter,  one  may  choose  a  depth  constant  &0  such  that  the 
sequence  of  fixed-lag  estimates  <fk-ko/k ,  k  =  k0  +  1;  A;0  + 
2,  -  •  gives  an  approximate  MAP  sequence.  Here  ^*-*0/* 
is  simply  the  phase  value,  k0  samples  back,  in  the  MAP 
sequence  based  on  measurements  up  to  time  k.  In  this  way, 
phase  values  are  estimated  with  delay  and  only  survivor 
sequences  of  length  k0  must  be  stored. 

How  should  Ic0  be  chosen?  This  is  a  difficult  question  to 
answer  precisely,  because  no  analytical  results  exist  for  the 
performance  of  nonlinear  phase  trackers  of  the  Viterbi-type. 
We  can,  however,  study  the  filtering  behavior  of  a  related 
linear  problem  and  find  how  performance  varies  with 
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Fig.  6.  Illustration  of  survivor  evolution  with  common  trunk. 


fixed-lag  k0.  To  this  end,  we  consider  the  problem  of 
tracking  phase  when  there  is  no  data  symboling.  Assume 
{ipk)  is  a  normal  random  walk  of  the  form  (6)  with 
wk  :  NWi( 0,  o^).  Let  xk  =  eJ*k  +  nk,  {n*}  be  a  sequence  of 
complex  random  variables  whose  real  and  imaginary  parts 
are  i.i.d.  Nn( 0,  <j„2)  random  variables.  A  PLL  with  gain  A", 
for  estimating  {^}  is  the  following: 

4  =  4- 1  +  Ki  \xk  | sin  (arg xk  -  4-i)-  (47) 

Note  that  this  is  similar  to  (13)  when  there  is  no  data. 

For  a2  •*  1  we  approximate  (47)  with 

4  =  4-i  +  *i(arg**  - 4- 1)-  (48) 

When  K ,  is  selected  to  be 

=  (°Z/°n2)[- 0-5  +  0.5(1  +  4 g>2)I/2J,  (49) 

then  (48)  is  the  Kalman  filter  for  the  “linear  observation 
model" 

arg**  =<('*+«*-**=  exp[y(^  +«*)].  (50) 

The  steady-state  filtering  error  P0  for  this  linear  problem  is 
related  to  /C,  as  follows: 


A  general  result  due  to  Hedelin  [12]  for  fixed-lag 
smoothing  may  be  adapted  to  random  walk  smoothing 
from  observations  of  the  form  (50).  The  steady-state  fixed- 
lag  smoothing  variance  Pko  at  delay  k0  is 

*0 

PkM  =  p0/oi  -  2g2' 

/=  i 

=  P0/°Z  -G2(l  -g2*°)/(i  -g2) 

G=\-Kx.  (52) 

The  infinite-lag  smoothing  variance  is 

Px/°Z=Po/°Z  -  G2/(l  -G2).  (53) 

In  Fig.  7  several  error  expressions  and  asymptotic  forms 
are  plotted  versus  o£/o2,  which  is  a  kind  of  SNR.  For 
large  0„2/on2,  the  error  variances  P0/o2,  Pw/o 2,  and  Px  /a2 
go  as  (,o2/a2)~\  For  small  o2/o„2,  they  go  as  (o2/o2)~,/2 
although  infinite-lag  smoothing  offers  6  dB  improvement  in 
a2/ a2  over  zero-lag  smoothing  for  a  fixed  smoothing  vari¬ 
ance.  Over  the  range  of  values  0.01  s  a2 /a2  <  10,  a  delay 


of  k0  =  10  offers  all  but  l-2dB  of  the  theoretically 
achievable  gain  from  infinite  delay.  In  communication 
problems  for  which  random  phase  is  a  significant  effect, 
the  ratio  a 2 /a2  is  typically  in  this  range.  Only  at  very  small 
values  of  a2 /a2  can  very  large  delays  k0  provide  large 
performance  gains,  but  in  this  case  there  is  no  real  phase 
fluctuation  problem  for  the  purpose  of  data  decoding,  and 
the  gain  is  not  worth  the  large  delay.  Shown  also  in  Fig.  7 
is  the  Kalman  gain  K ,  versus  a2 /a2. 

The  problem  considered  in  Section  IV  is  admittedly 
different  from  the  linear  problem  considered  here.  How¬ 
ever,  the  numerical  results  given  in  Fig.  8  for  the  Viterbi 
phase  tracker  illustrate  that  the  performance  gain  to  be 
achieved  with  a  fixed-lag  of  k0  =  10  is  much  as  predicted 
by  the  linear  theory.  For  the  results  of  Fig.  8,  the  phase 
space  was  discretized  to  m  =  48  values,  data  transmission 
was  8-ary  PSK,  and  the  decoding  algorithm  was  the  VA. 
The  circles,  dots,  and  squares  represent  experimental  phase 
estimation  error  variances,  and  the  heavy  solid  lines  repre¬ 
sent  theoretical  results.  Over  the  range  of  values  0.1  < 
a2 /a2  <  2,  the  phase  estimator  variance  for  the  Viterbi 
phase  tracker  operating  with  delay  k0  =  10  is  essentially 
equivalent  to  the  filtering  variance  of  a  Kalman  filter  that 
has  access  to  linear  observations  and  provides  estimates 
without  delay.  Performance  is  not  measurably  degraded  by 
the  presence  of  data  which  are  concurrently  decoded. 

VII.  Simulation  Results:  Gaussian  Increments 

For  all  simulation  results  discussed  in  this  section  the 
phase  space  [-rr,  ir)  has  been  discretized  to  48  equally 
spaced  phase  values  and  a  Viterbi  algorithm  has  been 
programmed  to  solve  the  MAP  sequence  estimation  prob¬ 
lem.  The  principle  of  optimality  established  in  Section  V 
has  been  used  to  derive  the  appropriate  path  metric  and 
thereby  reduce  computational  complexity.  The  choice  of  a 
fixed-lag  decoding  (or  depth)  constant  is  £0  =  10.  Source 
symbols  have  been  generated  independently.  The  random 
phase  sequence  has  been  governed  by  the  independent 
increments  model  of  (6)  with  wk  :  N( 0,  a2)  and  initial  phase 
uniformly  distributed  on  [-w,  n).  Initial  phase  acquisition 
has  been  achieved  by  transmitting  a  preamble  according  to 
one  of  the  following  schemes. 

a)  During  a  pretransmission  period  of  length  N,  the 
sequence  of  transmitted  data  is  known  to  the  receiver.  Thus 
in  the  DBVA  and  VA  systems,  based  upon  MAP  estima¬ 
tion,  the  Viterbi  algorithm  works  as  a  pure  phase  estimator 
during  this  period.  At  the  end  of  the  preamble,  the  Viterbi 
algorithm  is  turned  into  a  joint  phase-data  MAP  estimator. 
In  the  DDPLL  and  JE  systems,  based  upon  decision- 
directed  algorithms,  the  algorithm  is  directed  by  the  true 
data  during  the  preamble  period. 

b)  During  the  preamble  period,  identical  (but  unknown) 
data  are  transmitted.  This  keeps  the  phase  from  making 
phase  jumps  associated  with  symbol  changes  and  makes 
the  joint  phase-data  estimator  able  adequately  to  acquire 
the  initial  phase. 

In  our  simulations  the  VA  has  achieved  the  same  data- 
error  probability  for  both  methods;  i.e.,  its  performance 
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Fig.  7.  Linear  performance  results  for  evaluating  effects  of  fixed- lag  ka. 


Fig.  8.  Selected  phase  tracking  variances  with  and  without  data  trans¬ 
mission. 

has  not  depended  upon  which  learning  procedure  was 
used.  On  the  other  hand,  Ungerboeck’s  DBVA  have  proved 
to  be  sensitive  to  the  learning  procedure.  For  example,  at 
SNR  =  20  dB  with  phase  variance  o 2  =  4  o„2  for  a  learning 
period  of  N  =  60  data,  the  number  of  errors  during  a 
transmission  period  of  490  data  values  has  jumped  from 
seven  for  procedure  a)— known  data— to  59  for  procedure 
b) — constant  but  unknown  data.  Moreover,  the  DBVA 
typically  requires  a  longer  learning  period  than  does  the 
VA  (roughly  two  times  longer).  A  value  of  N  =  50  is 
sufficient  for  the  VA,  while  the  DBVA  needs  N  =  100 
learning  iterations  in  our  simulations.  The  decision-directed 
systems  (DDPLL  and  JE)  work  as  the  VA  in  these  respects. 
That  is,  a  preamble  period  of  50  data  values  is  sufficient. 
These  data  may  be  unknown  to  the  receiver,  provided  they 
are  kept  constant  (procedure  b».  No  degradation  with 
respect  to  procedure  a)  results. 


Fig.  9.  Symbol  error  probabilities  for  binary  symboling. 
Binary  Symboling 

Shown  in  Fig.  9  are  binary  symboling  results  for  the  VA 
when  o2  —  0.01  rad2(ow  =  5.7°)  and  SNR  ranges  from  4 
to  10  dB.  (Recall  SNR  =  101og)Q  l/2o2.)  The  results  indi¬ 
cate  that  performance  with  the  VA  is  essentially  equivalent 
to  that  of  a  fully  coherent  receiver,  even  for  a  relatively 
large  value  of  aw.  For  comparison,  the  curves  for  coherent 
binary  orthogonal  and  coherent  binary  antipodal  systems 
are  also  shown.  The  simulation  results  for  binary  orthogo¬ 
nal  symboling  are  interesting  because  they  serve  to  validate 
the  simulation.  Indeed,  as  expected,  the  performance  of  the 
VA  is  seen  in  Fig.  9  to  lie  between  that  of  an  incoherent 
receiver  and  that  of  a  fully  coherent  receiver.  Of  course,  the 
margin  between  coherent  and  incoherent  performance  is 
small  at  SNR’s  of  practical  interest  The  simulation  results 
for  binary  antipodal  symboling  are  interesting  on  their  own 
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Fig  10.  Symbol  error  probabilities  for  eight-PSK. 


because  incoherent  reception  is  not  possible  with  antipodal 
symboling. 

Eight-PSK 

Shown  in  Fig.  10  are  simulation  results  for  eight-PSK 
when  SNR  ranges  from  16-19dB  and  (o^o/)l/J  remains 
fixed  at  4.4  X  10  _3  rad2.  This  choice  of  parameters  corre¬ 
sponds  to  a  loop  SNR  of  23  dB  where  the  PLL  is  well  into 
its  linear  region  of  operation  and  little  can  be  gained  from 
improvements  to  the  phase  tracking.  The  values  of  o2 
under  investigation  range  from  1.6°  to  2.2°  and  the  ratio 
o2/o2  is  very  small,  ranging  from  0.03  to  0.12.  The  solid 
circles  of  Fig.  10  correspond  to  the  VA,  and  the  solid 
triangles  correspond  to  the  markedly  simpler  JE.  Also 
shown  in  Fig.  10  are  performance  bounds  for  fully  coher¬ 
ent  eight-PSK  and  16-PSK  symboling.  In  this  case  neither 
the  VA  nor  the  DBVA  provides  significant  improvement 
over  the  JE  or  DDPLL.  The  latter  two  receivers  are  simpler 
than  the  DBVA  which,  in  turn,  is  simpler  than  the  VA. 
Therefore,  for  such  cases  of  weak  phase  noise,  neither  the 
VA  nor  the  DBVA  would  be  favored  over  the  JE  or  the 
DDPLL. 

16-QASK 

Shown  in  Fig.  1 1- 13  are  simulation  results  for  16-QASK 
symbols  encoded  according  to  the  (4,  4)  CCITT  rule.  The 
decoding  procedure  are  JE,  DDPLL,  DBVA,  and  VA,  for 
three  distinct  values  of  the  ratio  e2/o„2.  Fig.  11  is  con¬ 
cerned  with  a  weak  phase  noise  («2/<j„2  =  0.25).  Fig.  12  is 
concerned  with  an  average  phase  noise  (e2/e„2  —  1),  and 
Fig.  13  is  concerned  with  a  large  phase  noise  (o2/o„2  =  4). 
We  recall  [1]  that  the  DBVA  performs  some  kind  of  phase 
estimation  along  a  path  that  satisfies 

(54) 


SNR 

Fig.  II.  Symbol  error  probabilities  for  16-QASK-CCITT  symboling  (low 
phase  noise). 


SNR 


Fig.  12.  Symbol  error  probabilities  for  16-QASK-CCITT  symboling 
(average  phase  noise). 


SNR 

Fig  13.  Symbol  error  probabilities  for  16-QASK-CCITT  symboling 
(high  phase  noise). 


using  a  Viterbi  algorithm.  The  DBVA  that  we  have  simu¬ 
lated  is  somewhat  different  from  Ungerboeck’s  DBVA,  in 
which  the  number  of  possible  phase  states  at  each  iteration 
is  limited  to  six  or  eight.  In  our  simulation  the  number  of 
phase  states  is  not  limited,  thus  avoiding  one  possible  cause 
of  errors  and  improving  the  error  rate,  but  also  increasing 
the  computational  complexity  with  respect  to  [1). 
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Behavior  of  DDPLL  and  JE  on  CCITT  (4,  4)  Constellation 

The  decision-directed  algorithms  (DDPLL  and  JE)  have 
essentially  the  same  performance,  as  shown  in  Figs.  11-13. 
The  DDPLL  is  superior  to  the  JE  by  only  0.5  dB.  The 
slight  inferiority  of  the  JE  is  largely  compensated  by  the 
fact  that  the  complex  gain  of  the  JE  can  also  correct  rapid 
gain  fluctuations  in  the  channel.  We  emphasize  that  the 
curves  of  the  DDPLL  and  JE  are  biased  and  cannot  be 
trusted  just  as  they  are  because  of  the  occurrences  of  very 
large  bursts  of  errors  at  relatively  high  error  probabilities. 
When  such  bursts  have  occurred  in  the  simulation  runs, 
they  have  been  withdrawn  from  the  error  rate  computation. 
For  instance,  with  of  =  0.25 of  and  SNR  —  17 dB,  at  an 
error  probability  on  the  order  of  10  ~2,  between  one  fourth 
and  one  third  of  the  simulation  runs  (with  length  500  data 
values)  have  exhibited  bursts  of  about  a  hundred  errors.  In 
the  simulations,  the  bursts  began  to  occur  at  SNR  =  18  dB, 
21.5 dB,  and  26 dB  for  of /of  =0.25,  I,  and  4,  respec¬ 
tively.  This  corresponds  to  a  value  of  ow  such  that  4 ow 
ranges  between  11.5°  and  20°.  The  phenomenon  of  error 
bursts  can  be  explained  as  follows:  because  the  phase 
increment  is  Gaussian  it  will  occasionally  reach  the  value 
4 ow.  If,  at  the  same  time,  the  noise  is  relatively  large,  the 
angle  between  the  observed  data  and  the  transmitted  sym¬ 
bol  will  exceed  the  value  22.5°  that  corresponds  to  the 
angular  threshold  for  an  error  in  the  16-point  CCITT 
diagram  (see  Figs.  2(d)  and  4(a)).  No  type  of  decision- 
directed  phase  estimator  can  correct  such  an  error.  There¬ 
fore,  the  phase  estimate  will  become  incorrect  (by  a  shift  of 
±45°),  causing  a  group  of  errors.  In  turn,  due  to  the 
decision-directed  nature  of  the  phase  estimator,  error  mul¬ 
tiplication  occurs,  resulting  in  an  error  burst.  The  impor¬ 
tance  of  the  burst  phenomenon  in  the  decision-directed 
algorithms  can  be  appreciated  from  Table  1.  The  table 
gives  observed  burst  frequency  in  runs  of  500  samples, 
parameterized  by  the  corresponding  observed  probability 
of  an  isolated  error.  The  results  are  given  for  a  DDPLL, 
but  they  are  essentially  the  same  for  the  JE.  Moreover,  the 
results  are  relatively  independent  of  the  ratio  of  /a„2  in  the 
range  0.25-4.  A  decision-directed  algorithm  is  not  a  relia¬ 
ble  phase  estimator  when  the  error  probability  reaches  the 
level  of  10  ~2,  corresponding  to  severe  transmission  chan¬ 
nels.  With  respect  to  burst  phenomena,  the  DDPLL  and  JE 
behave  similarly. 

Behavior  of  DBVA  and  VA  on  CCITT  (4,  4)  Constellation 

The  performance  of  the  VA  is  superior  to  that  of  the 
DBVA.  The  gain  achieved  by  the  VA  over  the  simpler 
DBVA  is  monotone  increasing  in  the  ratio  of  phase 
fluctuation  variance  of  to  additive  noise  variance  of.  While 
there  is  no  gain  when  of /of  =  0.25,  the  gain  is  1  dB  for 
o 2 /of  =  1  and  2dB  for  of /of  =  4.  Both  systems  perform 
better  than  the  DDPLL  or  JE,  the  improvement  again 
being  a  monotone  increasing  function  of  of /of. 

A  very  important  point  is  that  the  use  of  either  of  the 
two  MAP  phase  estimators  precludes  the  occurrence  of 
error  bursts.  The  errors  seem  to  be  grouped  in  twos  or 


TABLE  I 
Observed  Bursts 


Pe 

(Isolated 

Error) 

<  10"4 

2.0  x  10 'J 

1.5  x  10 -1 

Percent  bursts 

0 

7 

20 

threes,  and  no  error  multiplication  occurs  since  the  phase 
estimator  is  not  decision-directed.  Thus  such  MAP  se¬ 
quence  estimators  can  be  used  even  at  high  error  probabili¬ 
ties  on  the  order  of  10“ 2  or  10 

Comparison  Between  MAP  and  Decision-Directed  Phase 
Estimators 

The  improvement  that  can  be  gained  by  using  any  type 
of  MAP  estimator  for  phase  rather  than  a  simple  decision- 
directed  algorithm  is  again  an  increasing  function  of  of /of. 
Fig.  1 1  shows  that  only  1  dB  is  gained  by  the  DBVA  and 
the  VA  over  the  DDPLL  if  of  =  0.25  of.  This  gain  is 
realized  at  a  high  computational  price.  For  the  phase 
fluctuations  and  additive  noise  of  the  same  importance 
(of  /of  =  1),  the  VA  outperforms  the  DDPLL  by  3dB 
(see  Fig.  12),  but  the  gain  is  reduced  to  2  dB  for  the  simpler 
DBVA.  For  large  phase  fluctuations,  the  gain  is  important. 
For  instance.  Fig.  13  shows  that  the  VA  outperforms  the 
DDPLL  by  5  dB  when  of /of  =  4.  In  addition,  the  VA 
brings  the  insurance  that  no  burst  of  errors  can  occur,  even 
for  very  poor  SNR  and  large  phase  fluctuations.  In  fact, 
the  true  power  gain  of  the  VA  over  a  decision-directed 
algorithm  is  even  higher  than  just  claimed  if  one  takes 
account  of  the  additional  power  required  in  the  decision- 
directed  schemes  to  ensure  against  burst  as  well  as  random 
errors. 

Sensitivity  to  Imperfect  Knowledge  of  of  /of 

It  is  easily  seen  in  (18)  or  (37)  that  the  only  parameter 
required  in  order  to  proceed  with  the  VA  algorithm  is  the 
ratio  of  phase  variance  to  additive  noise  power.  The  same 
holds  for  the  DDPLL  whose  optimal  gain  X,  depends  on 
this  ratio  (see  (49)),  and  for  the  JE  whose  step-size  p  (see 
(14))  is  to  be  kept  close  to  X,,  but  smaller,  provided  the 
data  diagram  has  unit  power.  As  for  the  DBVA,  it  requires 
only  the  knowledge  of  of  in  order  to  determine  the  number 
m  of  discretized  phase  levels.  Thus  an  important  feature  of 
each  system  is  its  sensitivity  to  an  imperfect  knowledge  of 
of /of  (or  of)  because,  first,  of  can  vary  with  time  and, 
second,  the  actual  phase  can  fluctuate  according  to  a 
statistical  model  that  is  different  from  the  one  expected. 
The  less  sensitive  the  system  is  to  the  knowledge  of  of  /of 
(or  of),  the  more  robust  it  is. 

a)  Sensitivity  of  the  Decision-Directed  Systems:  Let  us 
denote  of  /of  by  a.  The  function  X,(a)  that  gives  the 
optimum  loop-gain  of  the  DDPLL  is  sketched  in  Fig.  14.  It 
is  quite  flat  except  for  a  very  close  to  zero  (e.g.,  a  <  0.2). 

Now  the  case  a  «  I  is  of  no  real  interest  for  the  purpose 
of  this  paper.  Indeed,  it  has  been  seen  previously  that,  in 
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Variance  to  Noise  Varianc* 

Fig.  14.  Optimum  gain-loop  of  DDPLL. 


this  case,  no  MAP  phase  estimator  is  worth  being  worked 
out.  Moreover,  any  reasonable  phase  estimator  will  per¬ 
form  satisfactorily.  When  a  is  not  negligible,  A,(or)  is 
slowly  varying.  For  example,  A,(l)//f,(0.25)  =  1.59,  and 
A'1(4)/A',(l)  =  1.34.  Thus  the  value  AT,(1)  =  0.62  for  the 
DDPLL  gain  is  correct  for  a  large  range  of  values  of  a. 
This  fact  is  largely  confirmed  by  the  simulations.  Hence, 
due  to  the  risk  of  error  multiplication  that  increases  very 
rapidly  with  Xj,  it  should  rather  be  set  to  the  lower  bound 
K|(amm)  corresponding  to  the  smallest  a  that  can  be 
expected,  rather  than  to  an  average  value  /£■,(<*,„).  which 
will  sometimes  be  too  large  and  bring  error  bursts.  Thanks 
to  this  precaution,  the  DDPLL  is  insensitive  to  a.  It  is  a 
robust  system. 

The  robustness  of  the  JE  is  also  excellent.  This  fact  was 
checked  on  numerous  computer  simulations:  as  a  function 
of  the  step  size  /i,  the  error  probability  /*(£;  ji)  exhibits  a 
minimum  which  is  very  flat,  as  sketched  in  Fig.  15.  The 
range  where  the  minimum  is  reached  does  not  depend 
critically  upon  a.  A  value  such  as  n  =  0.4  corresponds  to 
the  minimum  of  error  probability  for  a  in  the  range 
[0.25-1]  and  for  a  unit  energy  data  diagram. 

b)  Sensitivity  of  the  MAP  Phase  Estimators:  The  VA 
sensitivity  to  imperfect  knowledge  of  a  has  been  tested  in 
our  computer  simulations.  It  appears  that  the  VA  perfor¬ 
mance  is  not  appreciably  degraded  by  an  error  of  ±6dB 
for  a.  Hence  the  VA  robustness  is  at  least  as  good  as  that 
of  the  decision-directed  algorithms. 

On  the  other  hand,  the  DBVA  robustness  has  turned  out 
to  be  poor.  For  instance,  with  SNR  =  21  dB  and  a  =  4, 
the  DBVA  is  supposed  to  work  with  m  =  2ir/ow  =  50 
phase  levels.  If  only  45  levels  are  used,  corresponding  to  a 
0.9  dB  error  for  a,  then  the  error  probability  is  increased  by 
a  factor  of  two.  In  fact,  as  a  function  of  m,  P(E;  m) 
exhibits  a  minimum,  but  it  is  a  sharp  minimum.  This  poor 
robustness  can  be  understood  by  noting  that  in  the  DBVA, 
the  path  metric  is  not  a  function  of  a  =  a2/^2,  but  only  of 
oj.  This  may  be  one  of  the  main  drawbacks  of  the  DBVA. 

VIII.  Simulation  Results:  Bounded-Increments 
Phase  Jitter 

For  all  simulation  results  of  this  section  the  phase  space 
(-it,  w)  has  been  discretized  to  32  equally  spaced  phase 
values,  and  a  VA  has  been  programmed  to  solve  (17).  The 


assumed  increment  density  h(w)  is  the  uniform  density 

*(")  =  {i’  —  a  ■<  w  <  a\  2a  =  2w/16  (55) 

(.0,  otherwise. 

The  corresponding  discrete  transition  density  for  use  in  the 
path  metric  is 


/(♦*/♦*-.)  =  {‘A 


<t>k  ~  4k- i  =  -ir/16,0,  w/16 
otherwise. 


(56) 


The  resulting  VA  is  related  to  the  class  of  so-called  M 
algorithms  [16]— [18]  in  which  all  survivors  are  saved,  but 
only  M  (in  this  case  3)  candidate  originator  states  are 
allowed.  This  significantly  reduces  calculations  and  results 
in  an  algorithm  similar  in  spirit  to  the  DBVA  of  [1].  Still, 
however,  phase  is  tracked  only  on  [  — w,  v)  rather  than  on 
(-00,00). 

Source  symbols  have  been  generated  independently  from 
a  four-PSK  alphabet  and  used  to  differentially  encode 
phase  according  to  a  Gray  code.  The  random  phase  se¬ 
quence  has  been  generated  in  ways  to  be  discussed  below. 


Markov  Phase  with  Non-Gaussian  Increments 

Here  the  phase  is  generated  according  to  (6)  with  h(w) 
given  by  (55).  Thus  the  algorithm  is  matched  to  the  actual 
phase  sequence.  Shown  in  Fig.  16  are  performance  results 
for  the  VA  and  for  the  JE.  The  VA  outperforms  the  JE  by 
1.5  dB  over  the  range  10  dB  <  SNR  <  15  dB.  The  proba¬ 
bility  of  error  is  “probability  of  bit  error.” 

Sinusoidal  Phase  Jitter 

Here  the  phase  jitter  is  sinusoidal  (see  (4))  with  uni¬ 
formly  distributed  initial  phase  and  frequency  r.  The 
frequency  is  chosen  such  that  >>S  =  1  /24,  corresponding  to 
a  transmission  rate  of  4800  bits/s  with  baud  rate  1  /A  = 
2400  Hz  and  jitter  frequency  v  =  100  Hz.  The  runs  are 
2000-10,000  steps  long,  corresponding  to  4000-20,000 
transmitted  bits.  The  peak-to-peak  phase  deviation  is  20° 
or  60°.  For  these  experiments  the  VA  outperforms  the  JE 
by  1.5- 1.7  dB.  This  gain  is,  of  course,  achieved  at  a  high 
price  in  complexity. 
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Fig.  16  Symbol  error  probabilities  for  4-PSK  and  non-Gaussian  phase 
increments. 

Comparison  of  the  JE  and  VA 

In  the  simulations  reported  above,  the  ratio  a  =  o*/e„2 
ranges  from  0.02  to  0.81,  that  is  from  small  to  average 
values.  No  burst  of  errors  has  ever  been  observed  for  the 
JE.  This  is  due  to  the  fact  that  the  phase  increment  is 
always  bounded  as  appears  in  (4)  and  also  (SS).  The  bound 
is  much  smaller  than  the  angular  distance  between  adjacent 
data.  Thus  there  is  no  risk  of  a  ±90°  slip  (corresponding 
to  the  four-PSK.  diagram)  in  the  JE  phase  estimation. 
Hence  the  errors  will  be  scattered  rather  than  grouped,  and 
no  error  multiplication  phenomenon  can  happen. 

Owing  to  this  consideration,  to  the  fact  that  the  VA 
outperforms  the  JE  by  only  1.5  dB,  and  to  the  complexity 
of  the  VA,  a  practical  system  will  implement  the  JE  (or 
DDPLL)  rather  than  the  VA  (or  DBVA),  in  the  case  of 
bounded  increment  phase  jitter. 

IX.  Conclusion 

We  have  derived  a  principle  of  optimality  for  phase- 
amplitude  encoded  symboling  that  allows  one  to  simulta¬ 
neously  track  random  phase  and  decode  data  symbols 
using  the  VA  derived  in  [8]  and  (9).  The  VA  is  designed  for 
a  random  walk  phase  process,  a  very  severe  type  of  phase 
process.  In  such  a  process  there  exists  the  possibility  of 
large  phase  jumps.  The  VA  gives  excellent  performance 
because  it  benefits  from  the  use  of  a  lag  to  observe  future 
data  samples  which  make  large  phase  jumps  look  unlikely. 

In  order  to  reach  conclusions  about  the  type  of  phase 
estimation  that  should  be  used  for  given  types  of  phase 
fluctuations,  performance  comparison  of  the  VA  with  two 
simple  decuior  -  directed  (zero-lag)  phase  estimators, 
namely,  the  JE  of  [3]  and  the  DDPLL  of  (5),  and  with  the 


TABLE  II 

Choice  of  Phase  Tracker 


a 

P(E) 

Small 

Large 

case  1 

case  2 

Small 

JE  or 

see 

DDPLL 

Table  IU 

case  3 

case  4 

Large 

see 

VA  or 

Table  III 

DBVA 

TABLE  III 

Choice  of  Phase  Tracker:  Continued 

c  Small 

Large 

Small 

Large 

JE  or 
DDPLL 

DBVA 

JEor 

DDPLL 

VA  or 
DBVA 

DBVA  of  [1],  have  been  thoroughly  investigated  by  com¬ 
puter  simulations,  with  various  data  diagrams.  They  indi¬ 
cate  that  the  choice  among  the  four  systems  is  to  be  made 
according  to  four  parameters: 

1)  the  error  probability  P(E)  at  which  the  system  is  to 
be  used: 

2)  the  relative  importance  a  =  a2/o„2  of  phase  fluctua¬ 
tions  with  respect  to  additive  noise; 

3)  the  complexity  c  that  is  technologically  feasible  and 
acceptable; 

4)  the  maximum  phase  increment  that  is  to  be 

expected,  as  compared  to  the  angular  distance  be¬ 
tween  points  of  the  data  diagram. 

Suggestions  for  this  choice  are  sketched  in  Tables  II  and 
III  where  Table  III  is  concerned  with  cases  2  and  3  of 
Table  II. 

The  choice  between  the  two  decision-directed  phase 
estimators,  JE  or  DDPLL,  is  irrelevent  for  the  matters 
discussed  in  this  paper.  It  appears  in  Tables  II  and  III  that 

the  VA  and  DBVA  are  preferred  when  a,  P(E),  and  A4> _ 

are  large.  The  comparison  between  these  two  MAP  phase 
estimators  shows  that  the  VA  is  more  robust,  has  a  smaller 
learning  period,  and  outperforms  the  DBVA  by  2dB  or 
more  when  a  is  at  least  equal  to  four. 

Only  Viterbi,  or  Viterbi-like,  algorithms  can  survive  and 
correct  error  bursts  by  effectively  using  the  weight  of  future 
evidence  to  render  such  bursts  too  unlikely  to  occur.  Thus 
it  seems  likely  that  the  VA  (really  dynamic  programming) 
will  grow  in  importance  in  such  applications  as  spread 
spectrum  communication  where  the  phase  of  a  wide-band 
carrier  can  be  tracked  for  symbol  decoding. 
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Appendix 

Monotonicity  of  Folded  Normal  and  Cauchy 
Densities 


There  are  many  choices  for  the  phase  increment  density  h(x) 
that  are  physically  interesting  and  mathematically  tractable.  Two 
of  particular  interest  are  the  normal  density  and  the  Cauchy,  the 
latter  being  useful  in  the  modelling  of  “heavy-tailed”  behavior. 
When  folded  around  the  unit  drcle  according  to  (8)  these  densi¬ 
ties  yield  transition  densities  which  achieve  their  maximum  at 
$k  —  <t>k- 1  -  0  and  decrease  monotonically  on  the  interval  0  £ 
♦*  —  ♦*-!  S  n. 

Consider  first  the  Cauchy  case 


*■(*) 


_ a/r _ 

a2  +  (x  +  k2ir)2 


(57) 


According  to  Poisson’s  summation  formula  [13],  this  may  be 
written 


*'(*>  =  2 w  2  *  <"*1  «'k* 

*=  -00 

=  ^(1  -e"2“)(l  -  2e~acosx  +  e-2a)~'.  (58) 

This  function  achieves  its  maximum  value  at  zero  and  decreases 
monotonically. 

In  the  normal  case 

gi(x)  =  2  (2iro2)  l/2exp  {  —  (x  +  2kn)2/2o2} . 

*=  -oo 

(59) 

Again,  by  Poisson’s  summation  formula, 

*i(a)  =  2  (2ir )  '  exp  (jkx  —  k2o2/2) .  (60) 

k-  ~  oo 

This  infinite  sum  goes  by  the  name  /,(*,  q  =  e~'1/2)  in  the 
theory  of  Jacobian  elliptic  functions  and  theta  functions  [15].  The 
theta  function  J,(x,  q)  is  known  to  be  monotonically  decreasing 
on  the  interval  0  £  x  £  rr. 
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Aspects  of  Dynamic  Programming  in 
Signal  and  Image  Processing 
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Abstract— The  techniques  peculiar  to  dynamic  programming  have  found 
a  variety  of  successful  applications  in  the  theoty  and  practice  of  modem 
control.  Successes  in  the  theory  and  practice  of  signal  and  image  process¬ 
ing  are  less  numerous  and  prominent,  but  they  do  exist.  In  this  paper,  we 
sound  a  call  for  renewed  attention  to  the  potential  of  dynamic  programming 
for  solving  knotty  nonlinear  filtering  problems  in  signal  and  image  process¬ 
ing,  and  outline  successes  we  have  recently  enjoyed  in  nonlinear  frequency 
tracking  and  random  boundary  estimation  in  noisy  black  and  white  images. 
Two  classical  results  the  fast  Fourier  transform  and  Levinson’s  recursion 
for  determining  autoregressive  parameters  are  treated  in  the  context  of 
dynamic  programming  simply  to  reinforce  the  point  that  many  of  the 
algorithms  we  take  for  granted,  and  which  were  derived  without  recourse  to 
dynamic  programming,  can  be  nicely  interpreted  as  dynamic  programming 
algorithms 

I.  Introduction 

IN  THIS  PAPER  it  is  our  aim  to  show  that  dynamic 
programming,  a  fundamental  technique  in  control  theory 
since  Bellman’s  introduction  and  advocacy  of  it  in  the 
mid-I950’s,  can  be  of  considerably  more  value  in  signal 
and  image  processing  than  has  generally  been  recognized. 
This  is  not  to  say  others  have  failed  to  recognize  the 
potential  of  dynamic  programming  for  solving  interesting 
signal  processing  problems.  We  mention  in  particular  Cox’s 
early  work  [I],  [2]  on  Kalman  filtering  and  dynamic  pro¬ 
gramming  for  the  estimation  of  state  variables  and  the 
identification  of  system  parameters;  Viterbi's  dynamic  pro¬ 
gramming  algorithm  for  decoding  convolutional  code  se¬ 
quences  [3];  Cahn’s  dynamic  programming  algorithm  for 
FM  demodulation  [4J;  and  Forney’s  discussion  of  inference 
problems  on  finite-state  Markov  sequences  that  can  be 
solved  with  the  techniques  of  dynamic  programming  (5). 

In  the  sections  to  follow  we  rederive  classical  algorithms 
in  discrete  Fourier  analysis  and  linear  prediction  using  the 
principle  of  dynamic  programming.  We  then  present  two 
new  dynamic  programming  algorithms.  One  is  for  nonlin¬ 
ear  frequency  tracking  and  the  other  is  for  edge  detection 
in  noisy  black  and  while  images. 

The  organization  is  as  follows.  In  Section  II,  we  present 
an  elementary  dynamic  programming  formalism.  In  Sec- 
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tion  HI  we  use  dynamic  programming  arguments  to  rede¬ 
rive  the  Goertzel  and  decimation-in-frequency  fast  Fourier 
transform  (FFT)  algorithms  for  efficiently  computing  the 
discrete  Fourier  transform  (DFT).  In  Section  IV,  we  dis¬ 
cuss  the  connections  between  control,  detection,  estima¬ 
tion,  and  prediction  of  autoregressive  sequences  observed 
in  additive  noise.  We  highlight  the  central  role  played  by 
the  so-called  normal  equations  and  rederive  the  Levinson 
algorithm  for  recursively  solving  them  in  the  order  of  p 2 
operations.  The  derivation  is  a  dynamic  programming  one. 

The  new  results  follow  in  Sections  V  and  VI.  In  Section 
V,  a  dynamic  programming  algorithm  for  tracking  the 
frequency  of  a  frequency  modulated  sequence  in  additive 
noise  is  derived.  Several  simulations  illustrate  the  perfor¬ 
mance  of  the  algorithm.  This  provides  a  solution  to  a 
classical  nonlinear  filtering  problem.  The  results  of  Section 
VI  show  how  dynamic  programming  may  be  used  to  derive 
a  new  algorithm  for  estimating  local  segments  of  object 
boundaries  in  noisy  black-and-white  images.  Some  exam¬ 
ples  are  given  to  illustrate  the  use  of  the  algorithm  in 
estimating  complete  object  boundaries  as  well. 

II.  A  Dynamic  Programming  Formalism 

Traditionally,  dynamic  programming  has  been  used  to 
find  “optimum”  solutions  to  multistage  decision  problems 
[6],  [7],  An  “optimum”  solution  has  generally  been  one  that 
maximizes  or  minimizes  a  performance  or  cost  functional. 
When  the  multi-stage  decision  problem  is  cast  in  a  prob¬ 
abilistic  framework  and  the  criterion  of  optimality  is  maxi¬ 
mum  a  posteriori  (MAP)  probability,  then  the  cost  func¬ 
tional  is  typically  a  multivariable  likelihood  function  or 
some  monotone  function  of  it. 

The  following  is  a  formalism  that  is  rich  enough  to 
embrace  most  of  the  "signal-in-noisc”  problems  encoun¬ 
tered  in  signal  and  image  processing.  Let  {x*}**,  denote  a 
process  with  state  variable  representation 

!=/*(**.«*) 

.v*  =«*(**)•  0) 

Here  fk  and  gk  may  be  random  functions;  the  sequence 
[uk }  is  a  parameter,  decision,  or  control  sequence  that  may 
be  functionally  dependent  on  the  measurement  sequence 
{>>*}.  The  range  spaces  for  the  state  xk,  the  parameter  uk, 
and  the  measurement  yk  are  X U,  Y,  respectively.  These 
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Fig.  I .  States  and  characters. 


spaces  may  be  finite,  countable,  or  noncountable.  When 
the  spaces  X  and  U  are  countable  then  their  respective 
elements  may  be  placed  in  one-to-one  correspondence  with 
the  integers  and  the  formalism  of  Markov  chain  theory 
may  be  mined.  Even  though  the  states  of  X  may  be  chosen 
abstractly  and  appear  uninteresting,  the  mapping  gk  may 
be  chosen  so  that  the  signal  component  of  gk(  • )  generates 
characters  or  observations  Ck  that  are  of  great  interest.  The 
idea  is  simply  to  let  a  Markov  chain,  say  on  the  integers 
0, 1,2,  -  -  -  ,  control  the  dynamical  state  of  the  problem  and 
reserve  the  role  of  character  or  observation  generation  for 
the  observation  mechanism  g*(  - ).  This  point  is  illustrated 
in  Fig.  1  where  the  generated  characters  can  be  almost 
anything:  contours,  sequences,  images,  etc. 

Consider  a  finite  version  of  the  process  { **}-«,: 

*/V=(*0.  •*!•'■*  ’XN  ) 

=F„U*_l,tvl) 

fJN=(u  <>.••-. “n) 

>,N=(>’o->’|.-'  >yN)-  (2) 

Typically,  one  wants  to  maximize  a  performance  criterion 

lN(XN,UN,YN)  (3) 

with  respect  to  UN,  subject  to  constraints  CN(  XN,U^)  =  0. 
Call  /£(*„,(/*,  Yn)  the  maximum.  When  obeys  a  recur¬ 
sion  of  the  form 

lt{XH.Ug,  yn  )  =  /£■,'(  ) 

Jr Pn{*N'uN' Yn)  (4) 

then  dynamic  programming  comes  to  the  fore  and  the 
solution  U£  may  be  generated  recursively  as  the  limit  of 
the  following  sequence  of  solutions: 

u:=sm(u;j,',Y”),  *=1,2  ,•■•,*.  (5) 

The  functional  S„  describes  the  recursion  for  computing 
Thus  the  central  theme  is  to  imbed  the  solution  to  an 
N  stage  problem  in  a  sequence  of  simpler  n  stage  problems. 
When  the  underlying  state  and  parameter  spaces  are  Finite, 
the  solution  algorithm  is  finite-dimensional  and  imple- 


mentable  on  a  digital  computer.  When  they  are  uncounta¬ 
ble,  but  the  function  /  is  quadratic,  then  it  is  still  often 
possible  to  find  a  closed-form  recursive  solution  that  may 
be  programmed. 

A  very  large  class  of  problems  may  be  formulated  as 
before.  Two  particularly  noteworthy  examples  are  the  lin¬ 
ear  discrete-time  quadratic  regulator  problem  in  determin¬ 
istic  and  stochastic  control,  and  Markov  chain  sequence 
estimation  in  additive  noise.  On  the  other  hand,  there  are  a 
great  number  of  problems  that  admit  dynamic  program¬ 
ming  solutions,  but  which  are  not  naturally  formulated  in 
the  style  above. 

One  of  the  points  we  wish  to  make  is  the  following: 
recognizing  that  a  solution  is  a  limit  of  a  sequence  of 
approximants  which  may  be  recursively  computed  is  per¬ 
haps  more  fundamental  than  the  search  for  a  correspond¬ 
ing  optimization  problem.  The  chief  value  of  an  optimiza¬ 
tion  formulation  is  that  it  often  simplifies  the  search  for  the 
recursive  solution  algorithm. 

III.  Dynamic  Programming,  the  DFT,  and  the 
FFT 

The  DFT  certainly  constitutes  one  of  the  cornerstones  of 
modern  Fourier  analysis.  Its  uses  range  over  the  entire 
spectrum  (so  to  speak)  of  signal  processing  applications. 
The  DFT  is  a  mapping,  DFT:  {*„}*- 1  that 

takes  the  sequence  {jcJo-1  'nto  the  sequence  { Xm}"~ 1 
according  to  the  rule 

v- 1 

xm=  2  in  =  0, 1,*  •  •  ,N  —  1 

k-0 

WN  - exp( -jlit/N).  (6) 

Noting  that  W/)v“mN  =  1,  Vm,  we  may  write  Xm  as  follows: 

Xm  =  Ni' (7) 

11  =  0 

This  calculation  may  be  viewed  as  the  limit  of  the  follow¬ 
ing  sequence  of  imbedded  approximations: 

*-i 

*«‘=  2  *=1.2, •••,#.  (8) 

«  =  0 

Note  X*,k)  =  obeys  the  following  recursion: 

xT'=xm 

Ai'>=x0HV"  (9> 

So  Xm  is  obtained  as  the  limit  of  a  sequence  of  approxima¬ 
tions  that  begins  at  X^)=x0W^m  and  terminates  at  X(mN) 
=  X„.  This  is  the  so-called  Goertze!  algorithm  (8)  for 
obtaining  the  mth  DFT  variable  Xm,  as  the  output  of  a 
digital  filter  excited  by  the  sequence  {x,,}*  ' The  output 
of  the  filter  is  read  at  time  k  =  N  (sefc  Fig.  2). 
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Dynamic  Programming  and  the  Decimation-in-Frequency 
FFT 

The  Goertzel  algorithm  is  a  nice  dynamic  programming¬ 
like  solution  for  the  DFT.  However,  it  is  not  efficient. 
Computational  complexity  is  of  order  N2.  l  et  us  see  if  we 
can  improve  upon  it.  Consider  Xjnk)  for  even  frequency 
indices  m  =  2r: 


A  1 

*$;•=  2  *nWN  2rik  n)-  A=  1.2,’  •  ‘  ,N 
*1  0 
A  -  I 


=  2  x„Wy:2'k  k—  1,2.-  •  ■  ,N.  (10) 


n  0 

For  A  even  (say  A  —  2r ). 

2s  1 

x£"  =  2  v 

n  0 

V  1 

2s  1 

=  i  x 

2  V 

n  0 

n  s 

=  2  ****$• 

,v  1 

V  y  It/  r(S  /) 

n  0 

/  0 

(ii) 

This  shows  that  the  two  r-point  DFT  approximant  A')*5' 
may  be  obtained  from  two  s-point  approximants.  By 
choosing  5  =  N/2  and  continuing  backwards  in  this  way 
(for  odd  subindices,  as  well)  one  arrives  at  a  backward 
dynamic  programming  derivation  of  the  decimation-in- 
frequency  FFT  See  Fig.  3  for  an  elementary  representation 
of  a  four-point  decimation  in  frequency  FFT.  The  decima¬ 
tion-in-frequency  algorithm  improves  on  the  Goertzel  algo¬ 
rithm  by  requiring  complexity  on  order  N  log  N. 


Fig.  3.  Four-poinl  decimation-in-frequency  FFT. 

sive  time  series.  The  usual  state-variable  and  matrix  block 
diagrams  give  way  to  scalar  variables  and  digital  Filter 
blocks  of  moving  average  filters.  The  normal  equations  are 
highlighted  and  dynamic  programming  is  used  to  derive 
the  famous  Levinson  recursions. 

A.  Models 

Let  {**}  denote  a  scalar  zero-mean  wide-sense  sta¬ 
tionary  AR  sequence  of  order  p  (denoted  AR(  p))  that 
obeys  the  recursion 

p 

xk  —  2  anxk-»  +*’*>  VAr. 

n  —  1 

w„:  sequence  of  i.i.d.  V(0,  a2 )  random  variables  (r.v.s.).1 

(12) 

It  is  easy  to  see  that  the  covariance  sequence  {rm}00ai, 
rm=r_m-  associated  with  the  sequence  {xk}  obeys  the 
recursion 

p 

rm=  2  anrm-n+°l*n,>  W  =  0,1,  -.  (13) 

n  =  I 

From  here  one  may  write  out  the  so-called  normal  equa¬ 
tions: 


ro  rp-l 

ro 

o;  =  (a,.-  •  -,ap) 

rp=(n,---,rp).  (14) 


RPap  =rP 


Rp  = 


rp- 1 


IV.  Detection,  Estimation,  and  Control  in  the 
AR(  N )  Case:  Kalman  Filters.  Levinson 
Recursions,  and  Dynamic  Programming 


We  note  at  this  juncture  that  turning  rp  upside  down 
turns  the  solution  to  the  normal  equations  upside  down.  To 
see  this,  let 


Autoregressive  (AR)  models  for  signals,  states,  and  data 
play  a  starring  role  in  many  areas  of  signal  processing  and 
control.  By  appropriately  selecting  model  parameters  (and 
order)  one  can  model  the  covariance  structure  and  spectral 
characteristics  of  more  general  models.  The  so-called  nor¬ 
mal  equations  for  identifying  AR  parameters  are  elegant 
and  easily  solved  with  recursions  of  the  Levinson  type. 

In  this  section  we  tie  up  control,  prediction,  detection, 
and  estimation  in  the  special  case  where  we  are  dealing 
with  a  zero-mean  wide-sense  stationary,  scalar  autoregres- 


0  0  •••0  1 
0  1  0 

0 

1  •••  0  0. 


JJ  =  I 


(15) 


denote  the  exchange  matrix  and  note  by  the  Toeplitz 
symmetry  of  Rp  we  have  JRpJ  =  Rp.  Thus 


'Here  and  elsewhere  i.i.d.  stands  for  independent,  identically  dis¬ 
tributed  and  r.v.  stands  for  random  variable. 
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JRPJaP=rp 

RpJaP  =JrP 


(16) 


As  J  turns  vectors  upside  down,  this  proves  the  claim. 

B.  The  Normal  Equations  are  Fundamental 

The  AR  coefficients  a'p  =(a,,- •  -,ap)  that  characterize 
the  sequence  (xA)  are  fundamental  to  the  implementation 
of  control,  prediction,  and  detection  algorithms  on  noisily 
observed  AR  sequences.  Unfortunately,  sequences  rarely 
come  tagged  with  their  corresponding  AR  parameters. 
More  typically  finite  records  of  them  come  to  use  and  we 
estimate  a  covariance  function  (or  power  spectrum),  often 
by  FFT-ing,  squaring  and  windowing,  and  inverse  FFT-ing. 
These  estimates  may  then  be  used  to  solve  for  the  coeffi¬ 
cients  ap  from  the  normal  equations.  This  makes  the  nor¬ 
mal  equations  fundamental  and  arouses  our  interest  in 
efficient  ways  of  solving  them.  The  derivation  that  follows 
is  an  adaptation  of  Bellman's  discussion  of  quadratic  forms 
and  dynamic  programming  in  [9], 

C.  Dynamic  Programming  and  Levinson's  Algorithm 


Use  (20)  in  (22)  to  get  a  different  recursion  for  Qp(rp): 
Qpp  (  rp )  =  min  [  a2p r0  -  2a p rp  +  r0  -  (  rp  _ ,  -  ap  Jrp . . , )' 

’Rp-Arp- 1  ~apJrP- 1 )] 

=  min  [ a2p ( r0  - rp'_ ,  JRpl 

aP 

-  2ap(  rp  -  r;_  ]Rp!l  Jrp  _ ,  +  Q'l ,'( rp .  , )] 

=  min  [«*(r0 -a'l 

-2ap{rp-«>:\Jrp_x)  +  Qr  \(rp  ,)].  (23) 
It  follows  easily  that  the  minimizing  value  of  a  is 


r~  ~ap-\Jrp-\ 


«£  —  - - ~ 


ro  t'ff  i 


(24) 


Equivalently. 

(«;)2('b“«;-i>,  :i)  =  a'(rp-a':}'JrF-,).  (25) 

Substituting  the  solution  of  (24)  into  (23)  we  get  the 
following  recursion  for  Qp(rp): 


Consider  the  quadratic  form 

<?;(*)=(?r/($  \)~ap{rp~aP:\JrP  i 

Qp(rp)  =  r0  —2a’prp  +a’pRpap 

(17) 

=qpp  i(^-i)-(«;)j(»o-«?2ii>,- 

,)•  (26) 

This  quadratic  form  is  minimized  for  some  choice  of  ap 
that  we  denote  app.  It  is  easy  to  see  that 

Comparing  this  with  (20).  we  have 

ap  —q 

p  p 

(18) 

aprP=ap:  iV  i  +ap  (rp  ~°p:  i 'Jrp- 1 ) 

(27) 

where  ap  comes  from  the  normal  equation: 

from  whence,  it  follows  that 

aP=Rr  '  rP  ■ 

(19) 

«;'=(«;  />.«;)• 

(28) 

The  corresponding  minimum  of  Qp(rp )  we  denote  Qp(rp): 


-p'  p  ’ 


Qp(rp)=ro~ap'rP 


This  is  the  recursion  for  updating  ap. 

This  completes  the  Levinson  recursions,  summarized  as 
follows: 


=  rn  —r'R.  r„. 

U  P  P  p 


(20) 


o)  «;= 


The  quadratic  form  Qp{  )  may  be  written  recursively  as 
Qp(‘rp)  =  a2pro-2*Prp+QP  ()■  (21) 

So  minimization  of  Qp(rp)  with  respect  to  ap  may  be 
written 

Qp(f '.)=  min  QpiO 


rp  ~ap-  i'Jrp  - 1 

r0-«p~'‘rp- i 


min  |  ajr0  -  2aprp  +  min  Qp  t(rp  ,-apJrp ) 

ar  i 

=  min [a2pr0- 2aprp+  Qpp  \(rp  y-apJrp  ,)]. 

(22) 

This  equation  contains  the  essence  of  dynamic  programming 
and  the  principle  of  optimality  :  once  the  solution  op  l-  and 
corresponding  minimum  Q£  l.  have  been  found  for  the 
order  ( p  ~  1 )  problem.  a£  may  be  found  as  a  function  of  r0, 
rp,  and  a*  _ At  each  step  of  the  way  the  minimization  on 
ap  is  quadratic. 


(ii)  a;'  =  ,7. «;) 

(ill)  Q^(rp)  =  Q^:Urp.  ,)-(a^)2(r0-a^  }rp  ,). 

(29) 

There  are  no  matrix  inverses  here— only  vector  inner  prod¬ 
ucts.  Thus  the  algorithm  is  0(p2).  Of  course.  a£  is  the 
desired  solution  ap. 

To  show  the  importance  of  the  AR  coefficients  ap  =  afi, 
we  consider  the  following  family  of  problems: 

(i)  noisy  prediction 

(ii)  noise-free  prediction 

(iii)  minimum  variance  control 

(iv)  detection. 

D.  Noisy  Prediction  and  the  Kalman  Predictor 

Assume  the  sequence  (xt)  is  observed  in  zero-mean 
additive  white  Gaussian  noise  (WGN): 


i  ■  ■1|>>sS>.yS '«■ 
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Fig.  4.  Prediction,  detection,  and  control  of  a  noisy  AR(  N  )  sequence. 


**=**+»*. 

nk :  sequence  of  i.i.d.  N( 0,  o„2 )  random  variables.  (30) 


The  companion  form  state  model  is 
+ 1  =  -4  %k  +  ^Wk  + 1 


Zk=c'Xk 


'k~p+i 


0 

/ 

0 

°p  ap -1 

••  «o. 

b  =  c  = 


(31) 


The  stationary  Kalman  one-step  predictor  for  the  noisily 
observed  AR(  p)  sequence  is 

•^*+1  ~AXk  +K(zk  ~xk) 

xk=c'Xk  (32) 

where  K  is  a  ( px  1)  Kalman  gain: 

K  =  APc(c'Pc  +  o^y' 

=  (33) 

P=(A  —  Kc')P{A  -Kc’)'  +  o*KK'  +  o*bb'. 


The  Kalman  prediction  sequence  (i*  +  ))  for  (z*+!}  can 
be  interpreted  as  the  output  of  an  AR  moving  average 
(ARM A)  filter  with  p  poles  and  p-  1  zeros  (denoted 
ARMA(  p,  p-  1)),  driven  by  the  prediction  error  sequence 
(vk  -zk  —  }  or  an  ARMA(p,  p-  1)  filter  driven  by  the 


observation  sequence  {z*}.  The  resulting  filter  equations 
are 

p  p 

**+i  =  2  <*/**+!-, +  2  Sivk+\-i  (34) 


-**+!=  2  +  2  +  (35) 

i= i  i= i 

where  the  coefficients  g,  can  be  defined  by  the  characteris¬ 
tic  polynomial  A(A)  of  (A  -Kc'): 

A(A)=A'+2  (g-ai)\'-‘.  (36) 

i—  1 

See  Fig.  4  for  a  block  diagram  of  this  predictor.  Note  that 
the  noise-free  moving  average  (MA)  predictor  filter,  P{z) 
=  2£=  ]anz  is  preserved  in  the  feedback  loop,  but  that 
the  residual  sequence  vk  =zk—xk  is  now  weighted  with  a 
feedforward  MA  filter,  (?(z)  =  2£=lg„z 
Why  is  the  noisy  Kalman  predictor  ARMA  and  not 
MA?  The  answer  is  that  (zk},  a  noisy  version  of  an  AR 
signal  process,  obeys  an  ARMA( />,  p)  difference  equation. 
As  an  AR(p)  model  has  an  MA(p—  1)  predictor,  it  is  at 
least  logical  (if  not  intuitive)  that  an  ARMA(p,  p)  process 
has  an  ARMA(p,  p—  1)  predictor. 

E.  The  Noise-Free  Predictor 
The  prediction  vector  Xk  consists  of  the  terms 

E[xk-p  +  \/zk-\’Zk  -2<  ’  ’  ] 

:r  (37) 

E[Xk-\/2k-]'  Zk-2<  ’  ’  '  ] 

E[xk/zk-\’*k-l'  •)• 

When  o„2  =0,  then  zk—xk,  Vk  and 

E\xk-n/zk-n'  2k-n  -  l»  ’  ’  ‘  ]  —  E  [xk-n/xk -a*  ‘  ‘  '  ]  —  xk-n> 

n=  1,2,  •  •  • .  (38) 

So  in  this  case  the  prediction  vector  is 
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Xk-p  + I 
Xk-p+2 


xk-\ 
Xk/k  —  1 


(39) 


It  follows  that  P,  the  covariance  E[Xk  —  Xk  ]{ Xk  —Xk\  is 
[0  •••  0  1 


£(H'ADlt|i?t)-0,  it  is  clear  that  E(xk)  is  minimized  by 
choosing 

p 

vk~~  2  «,**-,•  (44) 

i=  i 

This  control  is  illustrated  in  Fig.  4  as  a  feedback  loop 
running  up  the  left  side  of  the  figure.  The  feedback  loop  to 
the  top  “compute”  box  shows  how  xk  would  be  used  for 
minimum  variance  control  in  the  noisy  case. 


P  = 


0 


(40)  G.  Dei  ction  and  the  Likelihood  Ratio 

Consider  the  hypothesis  test  H0  versus  //,  with 


Calculating  K  by  substituting  (40)  into  (33),  we  find  that 
A(  X )  =  \N  and  hence  g,=ar  This  implies,  as  one  would 
expect,  that  the  prediction  filter  of  (35)  reduces  to  the 
purely  MA  relation: 

p 

■**  +  !=  2  «,**+!-,•  (41) 

I  —  I 

F.  Minimum  Variance  Control 


H0:zk~nk,  k  =  0,1, 

:**=**+«*.  k  =  0,\.--,K  (45) 

and  the  data  assumed  stationary  over  the  interval.  This  test 
is  equivalent  to  the  test  H0  versus  //,,  where 

Ho-.r?'  =  zk:N(0,o?) 

//, :  pi":  N( 0,  p0  +o„2 );  p0:  variance  of  xk 

(46) 


One  of  the  simplest  control  strategies  is  minimum  vari¬ 
ance  regulation  where  one  desires  to  minimize  the  variance 
of  the  AR(p)  output  sequence  {x*},  and  force  E(xk)~ 0. 
The  well  known  separation  principle  allows  one  to  generate 
a  feedback  control  strategy  assuming  noise  free  measure¬ 
ments,  i.e.,  nk  =0,  and  then  use  the  same  strategy  in  the 
noisy  case  but  with  the  Kalman  filter  estimates  \xk}  re¬ 
placing  the  actual  filter  outputs  [xk). 

Assume  then  we  have  the  system 


and  vik)-zk—  xk  is  the  innovations  sequence  in  the  Kal¬ 
man  filter.  The  log-likelihood  ratio  for  this  problem  is 
proportional  to 


LR  =  tf- 


P0+°n  k=  0 


2  pk  +  — j  2  z, 

°n  k  =  0 


2 

k  • 


(47) 


Thus  the  statistics  2p*  and  %zk  are  sufficient  and  the 
log-likelihood  ratio  may  be  computed  as  in  Fig.  4. 


xk  =  2  a<xk  ^ +*'*+*’*  (42) 

i=i 

where  {cA}  is  our  feedback  control  sequence.  We  would 
like  to  minimize 

£(**)  =  £:!  2  a,xk~,  +*•'*+»*  | 

=  E  |  wk  +  2 wkvk  +  2*v*  |  S¥h) 

+  2  «!■**-<)  ) 

=  E(wl)  +  2E{E(wkvk/vk)vk) 

+  2£:|w*|  2  «!•**-!  )  ) 


V.  Frequency  Tracking  and  Dynamic 
Programming 

Phase  and  frequency  tracking  problems  comprise  some 
of  the  most  nettlesome  nonlinear  filtering  problems  in  the 
entire  realm  of  signal  processing.  Nonlinear  filtering  and 
MAP  solutions  have  been  reported  recently  by  Bucy  and 
Mallinckrodt  [1 1],  Ungerboeck  [12],  Tufts  [13],  Scharf  et  al. 
[14),  [15],  and  Wolcin  [16].  A  typical  problem  is  the  follow¬ 
ing:  observe  the  signal-plus-noise  sequence  {zA}  with 

zk  ~sk  +nk<  nk  -  sequence  of  i.i.d.  iv(0.  o„2 ) 
random  variables. 

*»=«**  (48) 

and  estimate  the  phase  sequence  {<(>*}  or  some  underlying 
function  of  it.  Here  the  character  assigned  to  state  <f>k  is 
sk  =exp(j<j>k ). 

In  all  that  follows  it  will  be  convenient  to  organize  the 
observed  data  into  contiguous  data  blocks: 


+  £  U+  2  aixk  -i  • 


i~  I 


(43) 


Since  ( }  is  uncorrelated  with  {**-,},  »>1,  and  since 


L«a-i 


z, 
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s,  =  vs.  k 


Fig  5.  Visualizing  the  random  walk  frequency  trajectories 


Think  of  the  ( NKX\ )  vector  Z  as  a  concatenation  of  K 
data  blocks  of  the  form  z,,  each  of  dimension  N. 

The  choice  of  a  model  for  </>k  deternrnes  whether  we  are 
talking  about  phase  or  frequency  tracking,  although  the 
distinction  between  the  two  is  more  imagined  than  real.  In 
[14)  the  phase  was  assumed  to  evolve  according  to  a 
random  walk  phase  model 

<t>k-<t>k  |+W* 

H  j :  sequence  of  i.i.d.  N( 0,  o2 )  random  variables  (50) 

and  a  complete  dynamic  programming  solution  was  pre¬ 
sented.  In  [  15]  a  discontinuous-phase  FM  model  was  as¬ 
sumed  and  a  dynamic  program  algorithm  derived. 

Here  we  assume  <t>k  evolves  according  to  the  continuous- 
phase  FM  rule 

i+w*  (51) 

where2 

_  2ir 

-  "q-'W'VI 

=  ^rf.  iN<k<(l+\)N.  (52) 

In  this  way  uk  is  fixed  at  the  value  (2 ir/Q)v,  for  a  block  of 
N  as  illustrated  in  Fig.  5.  Correspondingly,  <fk  increases  at 
a  fixed  linear  rate  for  N  samples  and  then  adopts  a  new 


linear  rate  for  the  next  N  samples.  Therefore,  we  may  write 
s,  as  follows: 

*,=exp  (j4>lN)d(p,) 


The  phase  <j>lN  is  the  total  accumulated  phase  after  tN  steps. 
It  depends  on  the  entire  history  of  frequency  terms 
»»o,  *„•  •  |  and  obeys  the  recursion 

QiN  ~  ^(r-  l)N  +  N 

Between  tN  and  (/  +  \  )N-  I  the  phase  grows  linearly  as 

=<f>,N+('~'A0^l», 

This  additional  phase  increase  is  accounted  for  in  the 
vector  d(v,).  See  Fig.  5  for  an  illustration. 

To  complete  the  model  we  assume  {?,)  is  a  sequence  of 
discrete  random  variables  that  take  values  in  the  set 
(0, 1,-  •  •.£)—  1 }  and  evolve  according  to  the  rule 

(54) 

where  u,  6  (0. 1.-  •  ,Q~  1 }  and  addition  is  modulo-0.  The 
distribution  of  the  sequence  of  i.i.d.  random  variables  is 
selected  in  such  a  way  that  the  transition  probability 

i)  (55) 

corresponds  to  our  notion  of  physical  reality.  We  may 
think  of  the  resulting  frequency  sequence  {«*}  as  a  finite- 
state  random  walk  on  the  circle  with  an  unusual  transition 
probability  structure.  Typical  trajectories  for  { uk )  and 
(sk)  are  illustrated  in  Fig.  5. 

The  joint  likelihood  function  for  Z  and  {»>,}  is  propor¬ 
tional  to 

K  '  1  t  v  \ 

i =  2  2  •"/’(t1-)-  (56) 

,  =  0  2o2  r=0  'r'- 1' 

Using  our  representation  for  s,  and  dropping  terms  inde¬ 
pendent  of  v,  we  obtain 

2  Re  { exp  ( ~j<PlN  )z',d*(v, ) )  +  2  Inp(~-l 

(57) 

The  term  z’,d*(v,)  is  nothing  more  than  the  DFT  of  z, 
evaluated  at  the  DFT  frequency  (2w/Q)r,.  The  best  way  to 
compute  it  is  to  zero-pad  z,  to  obtain  a  Q  point  sequence 
that  may  be  FFT’ed.  See  Fig.  6. 

Our  notion  of  the  most  likely  sequence  (r,)*  “ 1  is  the 


2  (  ]  denoto  integer  pari  of  (  ) 
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(frequency  stole) 


Fig.  6.  Data  processing  and  frequency  trellis  illustrating  evolution  of  surviving  frequency  tracks. 


sequence  that  maximizes  /.  This  is  the  MAP  sequence. 
Write  the  maximization  problem  as 

max  r*_ ,  (58) 

Wo  ' 

with 


with  1  (peak),  2  (secondary  peak),  and  3  (tertiary  peak)  are 
unreliable  the  estimated  frequency  sequence  (thin  line) 
tracks  the  true  frequency  sequence  (thick  line). 

This  formulation  improves  on  a  heuristic  idea  of  Rock- 
more,  who  was  perhaps  the  first  to  advocate  dynamic 
programming  search  for  likely  frequency  tracks  [17], 


r,  =  T, _ ,  +  -7  Re  {exp ( )z',d*(  v, ) }  +  In  p  ( ) . 

(59) 

So  our  maximization  problem  becomes 


max  max  r*_2+lnp(  1  ) 

W*  1  Mg  ’  \pK  2  I 


+  -  2Reexp(  j<f\K  )^<ac  - d<^*(*'<rc- ,>)  •  (^®) 


Thus  for  each  node  on  Fig.  6  we  evaluate  the  FFT  z',d*(v,), 
phase  it  by  exp(  and  find  the  best  route  through 

the  trellis  with  the  dynamic  programming  algorithm  of 
(61).  This  completes  our  algorithm  for  moderating  the 
usual  peak-picking  rule  on  the  FFT  with  prior  information 
/>(*’,/*',- 1  )•  The  reader  is  referred  to  (14J  for  a  more  com¬ 
plete  discussion  of  a  related  algorithm  for  nonlinear  phase 
tracking. 

Shown  in  Figs.  7  and  8  are  simulations  of  the  algorithm 
running  on  noisy  phase-coherent  FM  data.  The  parameters 
are 


where  CNR  is  the  carrier-to-noise  ratio. 

The  algorithm  is  run  in  a  fixed  lag  mode  [  14]  for  a  lag  of 
60.  The  results  show  that  even  when  FFT  peaks,  indicated 


VI.  Local  Boundary  Estimation  in  Noisy 
Black  and  White  Images 

In  digital  image  processing  one  is  interested  in  develop¬ 
ing  computer  algorithms  which  can  either  automatically 
extract  information  from  pictures  or  at  least  simplify  the 
process  of  manually  interpreting  them.  In  either  case,  a 
basic  step  involves  segmenting  a  picture  into  regions  with 
similar  features  such  as  gray  level  or  texture.  This  involves 
the  estimation  of  region  boundaries.  Boundary  estimation 
algorithms  make  use  of  operators  which  estimate  short 
segments  of  boundaries  using  picture  data  in  small  picture 
sections.  Examples  are  simple  gradient  operators  and  the 
well-known  Hueckei  operator  [18].  An  example  of  a  local 
sequential  estimator  which  is  also  used  for  this  purpose  can 
be  found  in  [19],  In  this  section  we  outline  a  new  dynamic 
programming  algorithm  for  sequentially  estimating  short 
boundary  segments.  We  then  briefly  discuss  an  algorithm 
which  pieces  together  the  short  segments  and  present  some 
examples  of  its  use  on  complete  images. 

A.  Image  and  Boundary  Models 

Let  a  digitized  black  and  white  image  be  represented  by 
a  matrix  with  components  gtJ  corresponding  to  the  gray 
level  value  of  a  picture  element  (pixel)  centered  at  position 
(/,  j).  The  value  gtJ  will  have  two  components— a  true 
picture  component  btJ  and  a  noise  component  ntj  so  that 
gtj  —bt)  +ntJ.  A  picture  is  assumed  to  consist  of  a  single 
region  of  gray  level  rin  lying  in  a  background  of  gray  level 
rMr  so  that  b,t  can  take  on  either  of  the  two  values  rm  or 
r^,.  The  noise  components  ntJ  are  assumed  to  be  indepen¬ 
dent  identically  distributed  Gaussian  random  variables  with 
mean  zero  and  variance  oJ,  denoted  ntJ:  N( 0,o2). 

An  edge  element  is  defined  as  the  line  segment  separat- 
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CONTINUOUS  PHRSE  RRNDOM  WRLK  FM 

MAP  FREQUENCY  ESTIMATION 

LAG*  10  BLOCKS  - DECODED  FREQ. 

R.  NALK  VAR.  =0.01000  - ACTUAL  FREQ. 

CNR  =  -3.0  NORMAL  DENSITY 


(  N.Q  )  «  (8  .32) 


TIME  IbLOCr-SI 

Fig.  7.  Frequency  tracking  at  CNR=  -  3.0  dB.  Random  walk  variance  =  O.OI  rad2. 


CONTINUOUS  PHRSE  RRNDOM  WRLK  FM 

MAP  FREQUENCY  ESTIMATION 

LAG*  10  BLOCKS  - DECODED  FREO. 

R.  WALK  VAR.  =0.10000  - ACTUAL  FREO. 

CNR  =  -3.0  NORMRL  DENSITY 


(  N.Q  )  «  18  .32) 


TIME  ICLC  «71 


Fig.  8.  Frequency  tracking  at  CNR- -3.0  dB.  Random  walk  variance  =  0.1  rad2. 


ing  two  adjacent  pixels,  and  as  shown  in  Fig.  9  a  boundary 
segment  consists  of  a  directed  sequence  of  edge  elements 
{/,}*.  As  illustrated  in  Fig.  10,  we  assume  short  boundary 
segments  to  be  generated  by  constructing  a  sequence  of 
edge  elements  that  terminate  at  the  boundary  of  a  rectan¬ 
gular  box.  Longer  sequences  of  edge  elements  defining 


longer  more  complicated  boundaries  are  obtained  by  exit¬ 
ing  successive  rectangles  Rk,  1  <k<N  containing  pk  - 
kx(2k-2)  pixels.  The  key  constraint  built  into  this  gener¬ 
ating  scheme  is  that  sequences  departing  one  rectangle 
cannot  re-enter  it.  Fig.  1 1(a)  gives  an  example  of  a  boundary 
which  is  consistent  with  this  model  while  Fig.  1 1(b)  shows 


SCHARF  AND  ELLIOTT:  SIGNAL  AND  IMAGE  PROCESSING 


1027 


j  j  +  l  J+2  j+3  j+4  j+5 


i  +  l 
1+2 
i+3 
i+4 
i+S 


V 

« 

m 

te 

V 

W 

t/4 

m 

p 

w 

w 

ip 

w 

jf 

m 

m 

m 

m 

Fig.  9  A  boundary  segment  in  small  picture  segment. 
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k  •  I 

R. 


: 

- 

Fig.  10. 


Example  of  boundary  segment  generation. 


Fig.  II.  Example  of  a  boundary,  (a)  Consistent  with  model,  (b)  Incon¬ 
sistent  with  model. 


Fig.  12.  CAT  scan  of  abdominal  section  of  human  body. 


a  similar  but  inconsistent  boundary.  In  the  latter  case  the 
edge  sequence  reenters  R4.  Although  this  scheme  restricts 
somewhat  the  types  of  boundary  segments  that  can  be 
generated,  it  is  still  very  reasonable  for  region  boundaries 
with  low  and  slowly  varying  curvatures  such  as  those  in  the 
body  computerized  axial  tomography  (CAT)  scan  shown  in 
Fig.  12.  The  maximuni  rectangle  size  pN  is  assumed  fixed  a 


w®  wT  v0  v9  ulO  ull  v,  1 2 
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Fig  13.  Possible  state  locations  for  *  =  1,2,  ••  -.5. 


Fi 


g.  14.  State  transition  dia 
Characters  C4.  k  =  1,2,-  - 
t.2,  -,5. 


;ram  for  k-  1,2,  •  ■,$  illustrating  a  set  of 
■  for  a  specific  process  realization  xk.k  = 


priori  and  is  a  function  of  the  boundary  curvature  proper¬ 
ties  for  the  region  of  interest. 

Boundary  segments  generated  by  such  a  model  are  natu¬ 
rally  represented  oy  a  sequence  of  states  in  a  Markov  chain 
where  the  index  parameter  k  for  the  rectangle  of  size  pk  is 
also  the  index  parameter  for  the  Markov  process.  A  process 
state  xk  at  “time”  k,  will  correspond  geometrically  to  the 
end  point  of  a  boundary  sequence  passing  out  of  Rk.  Fig. 
13  shows  all  possible  locations  for  xk,  denoted  as  x{.  when 
k=  1,2,  -,5.  The  number  of  possible  states  at  time  k  is  1 
fork=l,  3  fork=2,  and  9+4(k  — 3)  fork>3. 

Note  that  there  is  only  one  edge  sequence  between  any 
two  states  xi_ ,  and  x{  which  is  consistent  with  the  genera¬ 
tion  model  and  which  does  not  pass  through  another  state 
xk.  As  a  result,  a  boundary  segment  uniquely 

characterized  by  a  state  sequence  {xk}*. 

Fig.  14  contains  an  abstract  representation  of  a  typical 
realization  of  the  Markov  process,  together  with  a  descrip¬ 
tion  of  the  picture  or  character  Ck  associated  with  each 
state.  The  observed  image  will  be  a  noise  corrupted  version 
of  each  such  picture. 

If  the  regions  of  interest  have  smooth,  low  curvature 
boundaries  then  a  reasonable  rule  for  assigning  transition 
probabilities  p(xt|xt_,)  is  to  choose  p(xk\xk-{)  to  be 
inversely  related  to  the  distance  (measured  in  edge  de- 
menu)  between  states  x*_,  and  xk.  We  must  also  impose 
the  total  probability  constraint  that 

9+4<*-J) 

2  («2) 

7=1 
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B.  A  Dynamic  Programming  Algorithm  for  Estimating 
Boundary  Segments 

Using  the  pixel  data  in  an  NX2(N—  1)  block,  R„,  we 
next  formulate  a  dynamic  programming  algorithm  for 
estimating  the  most  likely  state  sequence  consistent  with 
the  generation  model.  The  algorithm  is  optimal  in  the  sense 
that  it  finds  the  state  sequence  that  maximizes  the  joint 
likelihood  of  the  data  in  Rs  and  the  corresponding  edge 
sequence  through  RN. 

To  begin  we  first  define  the  pixel  data  sets 

D,  =  {g„:  Pixel  (i,j)CRk) 

dk  =  {g,, :  pixel  (i,j)CRk,  pixel  ( ».  j )C Rk  , } - 

This  implies  that  Dk  =  Ud*,  D,  =  empty  set.  This 
recursion  is  essential.  Next  let  /( ■ )  denote  a  log-likelihood 
function,  and  Sv~  {•*<}*  denote  a  boundary  state  se¬ 
quence  of  length  N.  Then  /( DN,  SN ),  the  joint  log-likelihood 
of  a  boundary  tale  sequence  and  the  picture  data,  must 
satisfy 

l(DN,SN)  =  l(DN\SN)  +  l(S„)  (63) 

where  l(SN)  is  the  log-likelihood  of  the  state  sequence  SN 
and  /(  Dn  \Sn  )  is  the  pixel  data  log-likelihood  conditioned 
on  the  boundary  {r,}^  described  by  SN.  Since  the  state 
sequence  S„  is  a  Markov  chain  we  can  use 

I(  i  )  +  ln  p(xk  I-X*  -  i ) 

/(S,  )  =  /(*,  )  =  ln />,(*,)  (64) 

where  P,(xt)  is  the  probability  of  a  particular  starting  state 
x,.  Since  boundary  edge  sequences  are  prohibited  from 
reentering  rectangles  they  have  already  passed  out  of,  we 
can  express 

l(DN\SN)  =  l(DN_l\S„.i)  +  l(dN\xN)  (65) 

where  l(dN\xN)  is  the  log-likelihood  of  the  data  added  in 
extending  the  state  sequence  S*.  ,  to  SN  conditioned  on 
the  specific  new  state  xN.  Substitution  of  (65)  and  (64)  into 
(63)  leads  to  the  following  recursive  expression  for 
l{DN,  Sn)\ 

/(  Dn,  Sn)=I(Dn^,  Sn-\  )  +ln  p(xN\xN_  |)  +l(dN\xN ). 

(66) 

The  transition  probabilities  />(xjxt_,)  can  be  calculated 
using  a  distance  rule  such  as  the  one  discussed  above,  while 
incremental  data  log-likelihoods,  l(dk\xk)  can  be  calcu¬ 
lated  by  observing  that  the  pixel  gray  level  values  gtJ  are 
Nir^.o1)  if  gtJ  lies  inside  the  region  and  Af( o2)  when 
gtj  lies  outside  the  region.  Furthermore,  once  xk  has  been 
specified,  all  pixel  values  gtJ  in  dk  can  be  associated  with 
pixels  either  inside  of  or  outside  of  the  region.  Hence  if  we 
define 


/c(*)=  ~=^exp(-x2/2a2)  (67) 


(b) 


1 


(c) 

Fig.  15.  Examples  of  algorithm  performance  on  complete  objects,  (a) 
Ellipse  with  additive  Gaussian  noise  such  that  (r„-roul)/*=l.  (b) 
Lung  section  of  human  body,  CAT  scan,  (c)  Satellite  image  of  a  storm 
cloud. 


l(dk\xk)=  2  ln/c(&,-'in) 

+  2  In /c(  -'■«.«) 

—  C—  Y  rin) 

O.yie/* 


2 

0.»e/M 


( roul) 

2e2 


(68) 

where  C  is  a  constant  which  is  independent  of  the  choice  of 
xk,  and 


Irk  =  {('-  J)-  P'xel  (f,  j)  is  in  region  and  gu  edk } 

hk  =  {(/,/):  pixel  (i,  y  )  is  in  background  and  g(y &dk } . 

Finally,  a  dynamic  programming  algorithm  for  estimat¬ 
ing  a  state  sequence  SN  and  hence  a  boundary  edge  se¬ 
quence  (t, }*  which  maximizes  l(DN,  SN)  can  be  derived  by 


we  can  use 
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observing  that 

max  /( Ds ,  SN )  =  max  f  max  /( DN_ , ,  , ) 

X»  lSA-l 

+lnp(^|jcA,_,)+/(</A,|x^)J. 

This  boundary  segment  estimator  has  been  incorporated 
into  a  complete  boundary  estimation  scheme.  At  the  last 
stage  of  the  forward  dynamic  programming  algorithm  the 
three  most  likely  states  xN  are  used  to  generate  three 
complete  paths  out  of  a  rectangle.  These  are  stored  as 
nodes  on  a  tree,  and  an  A*  types  tree  search  algorithm  [20) 
is  used  to  piece  together  complete  boundaries.  Fig.  IS 
shows  some  examples  of  the  overall  algorithm  perfor¬ 
mance.  Fig.  15(a)  shows  the  algorithm  performance  on  an 
ellipse  imbedded  in  Gaussian  noise  such  that  the  signal-to- 
noise  ratio  {rin-rcni)/o=  1.  Both  the  actual  and  estimated 
boundaries  are  plotted.  Fig.  15(b)  shows  the  boundary 
obtained  for  a  section  of  the  CAT  scan  given  in  Fig.  12, 
and  Fig.  15(c)  shows  the  result  of  estimating  the  boundary 
of  a  satellite  image  of  a  cloud. 

VII.  Concluding  Remarks 

By  constructing  finite-state  Markov  chains,  and  assign¬ 
ing  characters  or  observations  to  these  states,  one  can 
model  such  things  as  continuous-phase  FM  signals  and 
random  boundaries  in  black  and  white  images.  The  model 
may  then  be  used  to  construct  a  likelihood  function  that 
may  be  written  recursively  and  maximized  with  the  tech¬ 
niques  of  dynamic  programming.  The  resulting  algorithms 
are  tractable  by  nonlinear  filtering  and  image  processing 
standards,  and  the  results  often  superior  to  what  can  be 
achieved  with  other  approaches. 

A  great  variety  of  signal  and  image  processing  problems 
may  be  phrased  along  the  lines  of  this  paper,  and  solved 
using  the  techniques  of  dynamic  programming.  This  chain 
illustrates  once  again  the  great  power  of  dynamic  program¬ 
ming  as  a  recursive  optimization  device. 
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BRIEF  OUTLINE  OF  RESEARCH  FINDINGS 

We  are  pursuing  research  on  three  distinct  but  related  problems: 

(1)  phase  model  extension  to  include  random  phase  modulation,  random  FM 
modulation,  and  random  chirp  modulation;  (2)  frequency  estimation  in 
signal-plus-noise  and  autoregressive  models;  (3)  dynamic  programming 
algorithm  development  for  FM  tracking;  and  (4)  simultaneous  phase  tracking 
and  data  decoding  on  random  phase  channels. 

(1)  Phase  Model  Extension:  Here  we  have  derived  phase  models  for  random 
phase,  random  FM,  and  random  chirp  modulation.  Each  model  is  a  Markov  chain 
defined  on  cyclic  group.  Covariance  and  spectral  results  have  been  derived. 
The  results  -  not  yet  published  -  generalize  existing  results  on  the  spectral 
theory  of  chains,  and  leave  us  with  the  problem  of  selecting  states,  transi¬ 
tion  probabilities,  and  "run  lengths"  to  achieve  model  matching  with  more 
conventional  models. 

(2)  Frequency  Estimation:  We  have  derived  maximum  likelihood  frequency 
estimators  and  Cramer-Rao  bounds  for  estimating  frequency  in  complex  normal 
signal-plus-noise  and  autoregressive  models.  The  estimators  have  been  simu¬ 
lated  and  modulo-2ir  errors  studied.  The  results  explode  a  currently  popular 
myth  regarding  frequency  tracking  at  low  signal- to-noise  ratios.  Work  will 
probably  be  published  shortly. 

(3)  Dynamic  Programming  Algorithm  Development:  In  reports  (a)  and  (b) 
under  item  7  of  this  document  we  have  derived  a  dynamic  programming  algorithm 
for  picking  the  optimum  frequency  track  through  a  sequence  of  contiguous  FFT 
maps  to  decode  the  MAP  frequency  sequence.  Algorithm  properties  are  under 
study  and  software  development  will  begin  soon. 

(4)  Simultaneous  Phase  Tracking  and  Data  Decoding;  A  principle  of  opti¬ 
mal  ity~for—phase— traclcing7datadecoding-has-beenderived  and  implemented  in 
software  to  decode  data  symbols  transmitted  over  random  phase  channels. 
Algorithm  performance  is  treated  in  report  (c)  under  item  7  of  this  report. 
The  algorithm  -  though  complex  -  outperforms  all  competitors. 
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The  last  progress  report  contained  a  complete  list  of  accomplishments  and 
ongoing  work.  That  outline  remains  in  force,  with  the  addition  of  the 
following 

a.  Phase  Model  Extension.  We  are  in  the  process  of  writing  up  our  work 
on  phase  models  on  the  circle.  This  work  could  lead  the  way  to  filtering 
on  finite  groups,  a  topic  I  raised  to  ARO  in  a  letter  to  Suttle  a  year  ago. 

b.  APMA  Systems.  We  have  reformulated  the  autoregressive  moving  average 
(ARMA)  modelling  problem  in  terms  of  linear  transformations,  rather  than 
linear  filters.  It's  too  early  too  give  a  prognosis,  but  new  insights 
are  developing.  An  invited  paper  for  IEEE  Trans  on  ASSP  is  in  progress. 
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June  11,  1980 


Dr.  Jimmie  R.  Suttle,  Director 
Electronics  Division 
U.S.  Army  Research  Office 
P.0.  Box  12211 

Research  Triangle  Park,  NC  27709 
Dear  Dr.  Suttle: 

Here  is  my  brief  report  on  scientific  accomplishments. 

PROJECT:  Viterbi  Tracking  of  Randomly  Phase  Modulated  Data 
DAAG29-79-C-0176 

OUTLINE:  At  Colorado  State  University  the  principal  investigator  and 
his  associates  are  working  on  a  nonlinear  smoothing  theory  for  randomly 
phase-  and  frequency-modulated  information.  The  investigator's  phase 
tracker  has  been  generalized  to  a  frequency  tracker.  Simulation  and 
theoretical  performance  evaluations  are  in  progress.  Analytical  investi¬ 
gation  of  Markov  chains  as  approximants  to  FM  signals  is  proceeding. 

Application  of  these  results  arise  in  1)  detection  and  estimation 
of  feeble  sinusoidal  signals  (such  as  oscillation  modes),  2)  phase 
synchronization  of  data  transmission  systems,  and  3)  decoding  of  fre¬ 
quency-hopped  FM  signals. 

INTERACTIONS:  Irv  Kullback,  U.S.  Army  personnel,  and  USC  Research  group 
at  Ft.  Monmouth,  May  29,  1980.  (Meeting  organized  by  Dr.  Wm.  Sander  , 
ARO). 


Louis  L.  Scharf 
Professor 

Electrical  Engineering 

LS :  f  r 


Recent  CXit.standi.nq  Accompl isl unt?n l  :  l>A/u::',*-7'i-r-()17(. 

July  12,  1982 


The  problem  of  FMs demodulation  has  a  lonq  history  of  re..ourch  and 
development  in  electrical  engineering.  In  its  modern  form  the  [>roblan 
is  to  estimate  phase  or  frequency  sequences  from  noisy  data  and  Co  use 
these  estimates  in  conventional  and  spread  spectrum  ccnnunicatiun  systems. 

The  principal  investigator  and  his  associates  have  developed  models 
for  random  phase  and  frequency  sequences  and  derived  likelihood  egressions 
for  noisy  observations  of  them.  The  investigators  have  applied  dynamic 
programming  to  find  an  algorithm  for  computing  the  maximum  of  the  likelihood. 
The  algorithm  has  been  applied  tfc  the  decoding  of  binary,  phase-sh i f t-keyed , 
and  quadrature-shift-keyed  data  sequences. 

The  results  of  this  research  suggest  that  there  are  numerous  nonlinear 
filtering  problems  in  signal  and  image  processing  that  can  be  formulated 
and  solved  as  nonlinearf  sequence  estimation  problems.  Among  the  possibilities 
are  boundary  estimation  in  noisy  black  and  white  images,  tomouraphic  image 
reconstructive  in  noisy  CAT  Scans,  and  veliicle  tracking  from  incomplete  and 
noisy  measurements . 
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The  Principal  Investigator  attended  the  following  ARO-sponsored  meetings 
at  fort  Monmouth: 

Spread  Spectrum,  29  May  1980 
Fort  Monmouth,  New  Jersey 


Spread  Spectrum  Seminar,  22  May  1981 
Fort  Monmouth,  New  Jersey 


At  the  22  May  1981  meeting  he  presented  a  paper  titled, 

"Viterbi  Tracking  of  Randomly  Phase-Modulated  Date" 


TENTATIVE  AGENDA 


SPREAD  SPECTRUM  SEMINAR 
Fort  Monmouth,  NJ 

May  22,  1981 


0830  Amy  Presentations 

1030  Break 

1045  Spread  Spectrum  Receiver  Using  SAW  Devices 

Prof.  Pankaj  Das,  Rennselaer  Polytechnic  Insitute 

1130  Viterbi  Tracking  of  Randomly  Phase  Modulated  Data 
Prof.  Louis  Scharf,  Colorado  State  University 

1215  Lunch 

1315  Research  in  Digital  Communications 

Prof.  Robert  Scholtz,  University  of  Southern  California 
Prof.  William  Lindsey,  University  of  Southern  California 

1445  Break 

1500  Spread  Spectrum  Communications 

Prof.  Michael  Pursley,  University  of  Illinois 
Prof.  Robert  McEliece,  University  of  Illinois 


1630  Closing 


